UNCLASSIFIED 


_ AD  NUMBER _ 

AD253273 

LIMITATION  CHANGES 
TO: 

Approved  for  public  release;  distribution  is 
unlimited. 


FROM: 

Distribution  authorized  to  U.S.  Gov't,  agencies 
and  their  contractors ; 

Administrative /Operational  Use;  JAN  1961.  Other 
requests  shall  be  referred  to  Air  Force 
Cambridge  Research  Laboratories^  Bedford 
Massachusetts . 


_ AUTHORITY 

AFCRL  Itr,  3  Nov  1971 


THIS  PAGE  IS  UNCLASSIFIED 


kif  ike 

armed  services  technical  information  agency 

ARLINGTON  HALL  STATION 
ARLINGTON  12.  VIRGINIA 


UNCLASSIFIED 


IV 


NOTICE:  When , government  or  other  drawings;  speci¬ 
fications  or  other  data  are  used  for  any  purpose, 
other  than  in\  connection  with  a  definitely  related 
government  procui’ement  opeip.tlon,  the  U.  S. 
Government  thereby  incurs  no  responsibility;  nor  any 
obligation :'5^tsoever;'  and  the  fact  that  the  Govern¬ 
ment  may  have  foumilated,  furnished;  or  in  any  way 
■ -  supplied .^he  said  ^^wings,  specifications;  or  other 
j^ta'is  not  to  be i  regarded^y  implication  or  other¬ 
wise  as. in  any  manner  licensing  the  holder  or  any  _ 
other  "'person  or  corporatiOTif^or  conw^^  any  ri^ts 
or  pemission'^to  manufacture^  juse  or  sell  ary 
patented  in-yention  that  may  in  any  way  be  ri-,.'<"ted  . 
thereto. 


MURRAY  E  SHERRY 


JANUARY  1961 


ELECTRONICS  RESEARCH  DIRECTORATE 
AIRTOf^OE-XAh^lDGE  RESEARCH  LABORAtORIES 
AIR  FORC'^E  RESEARCH  DIVISION 
AIR  RESEARCH  AND  DEVELOPMENT  COMMAND 
UNITED  STATES  AIR  FORCE 
BEDFORD  MASSACHUSETTS 


afcel-109 


SIWTACTIC  ANALYSiuS  IN  AUTOMATIC  TRANSLATION 


Murray  E.  Sherry 


This  report  was  originally  published  as  Report 
NSF-5  on  Mathematical  Linguistics  and  Automatic 
Transl^tioji  to  the  National  Science  Foundation 
by  the  Computation  Laboratory  of  Harvard  University, 


Project  5632 
Task  56325 


Jar.aary  196 1 


Computer  and  Mathematical  Sciences  Laboratory 
Electronics  Research  Directorate 
Air  Force  Cambridge  Research  Laboratories 
Air  Force  Research  Division  (ARDC) 

United  States  Air  Force 
Bedford,  Massachusetts 


SYNTACTIC  ANALYSIS  IN  AUTOMATIC  TRANSLATION 


A  thesis  presented 
by 

Murray  Elliot  Sheriy 
to 

The  Division  of  Engineering  and  Applied  Physics 

in  partial  fulfillnient  of  the  requirements 

for  the  degree  of 
Doctor  of  Philosophy 
in  the  sublet  of 
Applied  Matht-iiatics 


Harvard  University 
Cambridge,  Massachusetts 
August  i960 


PREFACE 
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SYNOPSIS. 


This  thesis  is  concerned  with  a  method  for  the  syntactic 
analysis  of  Russian  sentences.  Applied  to  automatic  translation,  this 
method  is  divided  into  a  morphological  word-by-word  phase  and  a 
syntactical  sentence-by-sentence  phase. 

An  idealized  canonical  stem  dictlonaiy  is  presented,  and  its 
significant  lexicographic  properties  are  pointed  out.  This  idealized 
dictionary  then  serves  as  a  basis  for  evaluating  the  actual  Harvard 
Automatic  Dictionary.  Aspects  of  morphological  analysis  of  the  Russian 
language  and  the  series  of  programs  written  to  carry  it  out  are  described. 
To  explain  the  practical  problems  encountered  in  an  experimental  syntactic 
analj’‘sis  program,  of  which  a  detailed  description  is  given,  a  new  model  of 
natural  language  is  introduced.  A  more  detailed  outline  of  this  thesis  is 
given  in  Chapter  1. 

The  idealized  canonical  stem  dictionary,  the  method  of  morphological 
analysis  of  Russian,  the  construction  of  the  new  model  of  natural 
language  and  substantial  aspects  of  the  realization  of  an  operating 
experimental  syntactic  analysis  program  represent  efforts  of  the 
writer. 
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Chapter  1 
INTRODUOTICN 

A  block  diagram  of  an  idealized  system  for  translating  automatically 
between  two  languages  is  given  in  Fig.  1-1.  The  text  in  the  source  language 
is  first  transcribed  onto  some  medium  suitable  for  input  to  a  high  speed 
digital  compute!**  Next,  the  text  is  translated  into  the  target  language  by 
an  appropriate  sequence  of  programs.  Finally,  the  translated  text  is 
recorded  onto  a  medium  suitable  for  reading  or  reproduction. 

To  prepare  a  text  for  processing  by  a  digital  computer,  it  is 
necessary  to  transcribe  the  text  onto  a  magnetic  tape  or  some  equivalent 
medium.  Ideally,  transcription  should  be  performed  by  a  print-reading 
machine  capable  of  identifying  the  various  types  of  letters  found  on  a 
printed  page.  At  present,  the  texts  which  are  used  for  experimental  pur¬ 
poses  are  laboriously  typed  either  onto  punched  cards  or  directly  onto 
paper  or  magnetic  tape.  At  the  other  end  of  the  process,  the  output  of 
the  computer  program  can  be  reproduced  singly  or  in  multiple  quantities  by 
a  number  of  sa'tisfactory  processes.  If  the  recognition  of  diagrams  and 
pictures  is  desired,  further  complications  arise  at  the  transcription  and 
the  recording  steps. 

The  process  of  translating  a  text,  as  carried  out  on  a  digital  computer, 
can  be  subdivided  into  four  phases:  dictionary  lookup,  syntactic  analysis, 
semantic  analysis,  and  target  language  synthesis.  To  look  up  the  words  of 
the  source  language,  a  bilingual  dictionary,  a  sequence  of  programs  to 
control  the  operation  of  the  dictionary,  and  a  set  of  programs  for  correct¬ 
ing  and  updating  the  dictionary  are  necessary.  The  grammatical  roles  of  the 
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source  language  words,  which  are  functions  of  both  the  lexical  characteris¬ 
tics  of  the  individual  words  and  the  relationships  among  the  words  in  a 
sentence,  are  determined  by  the  ssmtaetic  analysis.  In  general,  more  than 
one  target  language  corresnondent  is  stored  in  the  dictionary  entry  of  a 
source  language  word,  since  the  source  language  word  can  take  on  different 
meanings  when  used  in  different  contexts.  The  appropriate  meaning  of 
each  source  language  word  in  its  given  context  is  selected  by  the  semantic 
analysis  of  the  gyntactically  analyzed  text.  Finally,  the  target  language 
correspondents  are  inflected,  rearranged,  and  appropriate  words  such  as 
prepositions  and  articles  are  added  where  required. 

As  an  example  of  the  complete  process,  consider  the  Russian  to 
English  translation  of  the  sentence:  PacnaA  Ka^W'^ro  aroMa  npowcxoAMT 
iJTHOBeHHO,  noAodno  BspuBy,  Possible  English  correspondents  of  the  Russian 
words  in  a  Russian-English  dictionaiy  are  given  in  Table  1-1.  The  analysis 
of  the  sentence  that  :d.ll  now  be  described  is  idealized,  although  some 
sections  of  the  analysis  are  already  operation  and  will  be  discussed 
in  this  thesis.  An  analysis  of  the  individual  words,  bared  on  their 
lexical  characteristics,  is  given  in  Table  1-2. 

A  syntactic  study  of  the  sentence  would  result  in  the  following 
analysis,  PacnaA.  is  nominative  since  it  is  the  subject  of  the  sentence. 
KasAoro  is  used  adjectivally  in  the  genitive  possessive  noun  phrase 
KasAoro  aTOMB  .  IIpokoxoamt  is  the  predicate  head  and  mthobshho  is  an  adverb 
modifying  the  verb.  IIoAodKo  Bepmsy  is  an  adverbial  phrase  that  modifies 


the  verb. 


paciBA 

disintegration,  decanposition 

Kaa^oi^ 

of  each  (one),  of  every  (one) 

aToua 

atom 

npOMCXOflUT 

happen,  occur,  take  place,  descend 

UTHOBeHHO 

instantaneous,  momentary 

noflodHO 

like,  similar 

BspuBy 

explosion,  outburst,  burst 

English  Equivalents  for  Some  Russian 
Words  in  a  Russian-English  Dictionary 

mm  1-1 


pacnaA 

Either  nominative  or  accusative  singular,  nascullne 
noun- 

Kao^oro 

Pronoun  used  adjectively  or  ncminalily,  either 
genitive  singular  and  either  masculine  or  neuter, 
or  accusative  singular  iriasculine. 

aroisa 

Genitive  singular  masculine  noon. 

npOMOXO^^OT 

Third  person  singular,  present  tense,  indicative 
verb. 

MTHOBeHHO 

Adverb  that  can  be  used  as  a  predicate  in  place  of 
a  verb. 

noflodno 

Adverb  that  can  be  used  as  a  predicate  in  place  of 
a  verb. 

BspHBy 

Dative  singular  masculine  noun. 

Word-by-Word  Analysis  of  the  Sample  Sentence 
TABLE  1-2 
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The  next  phase  would  select  the  appropriate  English  correspondents, 
"Disintegration"  would  be  selected  for  paonafl,  "instantaneous"  for 
uTHOBeHEO,  "like"  for  noflodno,  and  "explosion"  for  eapsiBy.  Either  alterna¬ 
tive  could  be  used  for  Kastfloro  and  any  of  the  first  three  altermtives 
for  npoMoxoflMT. 

Three  of  the  English  equivalents  would  be  inflected.  Since  Kasfloro 
is  used  adjectivally,  the  correspondent  would  be  "of  each"  rather  than  "of 
each  one".  An  "s"  would  be  added  to  an  English  verb  such  as  "happens"  for 
"happen".  Finally,  a  "ly"  would  be  added  to  "instantaneous"  to  indicate 
the  adverbial  usage.  The  English  translation  would  then  be: 

"Disintegration  of  each  atom. happens  instantaneously,  like  explosion." 

The  translation  would  be  complete  if  the  English  articles  were  Included: 

"The  disintegration  of  each  atom  happens  instantaneously, 
like  an  explosion," 

The  thesis  is  chiefly  concerned  with  the  second  of  the  four  pro¬ 
cesses  of  Fig.  1-1.  The  ability  to  carry  out  the  experimental  analysis  is 
predicated  on  the  existence  of  an  automatic  Russian -English  dictionary'-  and 
its  associated  controlling  routines.  In  this  discussion  syntactic  analysis 
will  include  both  the  morphological  word-by-word  analysis  and  the  sentence- 
by-sentence  analysis  described  in  the  previous  ertample. 

The  method  for  producing  the  morphological  analysis  can  vary  o-ver  a 
wide  range  and  is  mainly  a  function  of  the  type  of  dictionary  used.  In  a 
full  paradigm  dictiona.iy'  ^  which  has  a  unique  entry  for  every  inflected 

'  The  set  of  all  iinflected  forms  for  a  given  word  is  called  the  paradigm 
of  the  word. 
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word  •foi'm,  it  is  possible  to  store  all  the  Imovm  syntactic  infonnation  per¬ 
tinent  to  each  word  form  directly  in  the  dictionary  and  read  it  out  when¬ 
ever  the  given  form  is  looked  up  in  the  dictionary.  An  alternative  is 
to  store  a  segment  of  a  word  form  ccramon  to  all  the  paradigmatic  forms 
of  the  word  rather  than  the  whole  word  form.  A  sSngle  dictionary  entry 
can  then  represent  the  entire  paradigm.  Such  a  dictionary  requires  much 
less  storage  space  than  a  full  paradigm  dictionary  because,  as  Giuliano 
has  indicated,  there  is  an  average  of  about  ten  word  forms  within  a 
Russian  paradigm.  So  long  as  large-scale  storage  devices  remain  extremely 
expensive  to  acquire  and  operate,  the  latter  alternative  will  seem  more 
attractive , 

Since  the  Harvard  Automatic  Dictionary,  which  is  a  compromise  be¬ 
tween  the  two  extremes,  is  a  result  of  the  cumulative  efforts  of  a  number 
of  investigators  over  several  years,  it  has  become  difficult  to  Isolate 
the  essential  features  of  the  system  from  the  pieces  that  have  been  incor¬ 
porated  to  make  up  for  previously  encountered  shortcomings.  An  idealized 
canonical  stem  dictionary  is  presented  in  Chapter  2  to  point  out,  on  the 
one  hand,  the  significant  lexicographic  details  of  such  a  dictionary  and  to 
provide,  on  the  other  hand,  a.  basis  for  comparing  actual  dictionaries  and 
particiilarly,  for  evaluatir^  the  actual  Harvard  Automatic  Dictionary.  The 
idealized  dictionary  is  described  in  a  mathematical  notation  in  an  attempt  to 
ascribe  clearly  defined  characteristics  to  it.  Included  in  this  chapter  is  a 
method  for  the  construction  of  the  dictionary  and  of  the  individual  entries,  as 

'I 

well  as  a  method  for  the  morphological  analysis  of  text  words. 

The  author  is  indebted  to  D.  ¥.  Davies  of  the  National  Physical 
Laboratory,  England,  for  comments  which  provided  a  point  of  departure  for 
the  investigation  reported  in  Chapter  2.  Mr.  Davies  visited  Cambridge, 
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Massachusetts,  in  December,  19^9,  after  he  and  his  staff  had  studied  the 
previous  publications  of  the  Harvard  project.  To  store  grammatical  informa¬ 
tion  in  dictionary  entries,  Mr.  Davies  outlined  a  scheme  vrhich  is  approximately 
the  sarie  as  the  ''entry  function  vector". 

Whereas  in  a  full  paradigm' dictionary  the  individual  dictionary 
entries  can  be  preceded  with  all  the  grammatical  information  relevant  to 
each  inflected  form,  this  approach  is  impossible  when  a  stem  dictionary  is 
used.  Some  of  the  grammatical  jjiformation  in  a  Russian-English  dictionary, 
such  as  case  and  number,  is  dependent  on  the  v;ord  endings.  It  is  therefore 
necessary  to  analyze  the  endings  and  stem  dictionary  entries  after  the  look¬ 
up  process.  As  many  ambiguities  as  possible  are  resolved  on  a  word-by¬ 
word  basis  to  reduce  the  burden  placed  on  the  more  complex  sentence-by¬ 
sentence  syntactic  analysis  which  follows  the  morphological  analysis. 

The  problems  involved  in  the  word-by-word  analysis  are  discussed  in  Chapter 
3  and  the  analysis  programs  are  presented  there.  In  addition,  several  other 
programs  that  have  been  written  to  patch  the  existing  dictionary  are 
included  in  this  chapter.  The  output  of  these  programs  is  identical  with 
the  output  of  the  idealized  dictionary  described  in  Chapter  2,  although 
the  processes  differ  greatly  in  detail. 

The  method  of  predictive  syntactic  analysis  is  based  on  the  empirical 
technique  for  the  syntactic  analysis  of  Russian  devised  by  I.  Rhodes  of  the 
U,S.  National  Bureau  of  Standards,  The  author  had  the  privilege  of  being 
introduced  to  this  technique  while  working  with  Mrs,  Rhodes  during  the 
summer  of  1959-  The  technique  is  based  on  the  premise  that  many  Russian 
sentences  can  be  analyzed  on  a  left-to-right  pass,  scanning  each  word  of 
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the  sentence  once  ar’o  in  order.  The  syntactic  role  of  a  word  In  a  sentence 
can  be  determined  from  the  syntactic  roles  of  the  words  preceding  it.  More¬ 
over,  on  the  basis  of  the  analyzed  word,  it  is  possible  to  make  further 
predictions  about  the  syntactic  roles  of  the  words  which  can  follow.  The 
predictions  are  stored  in  a  prediction  pool,  an  approximation  to  a  simple 
pushdown  store,  that  is,  a  lineal’  array  of  storage  devices  in  which  informa¬ 
tion  is  entered  and  removed  from  one  end  only  according  to  a  "last-in-first- 
out"  technique. 

In  an  effort  to  explain  the  practical  problems  arising  in  the  predic¬ 
tive  analysis  of  natural  languages,  a  model  of  natural  language  has  been 
developed  in  Chapter  1;.  The  algorithms  which  operate  on  the  model  language 
show  the  essential  usefulness  of  the  fundamental  concepts  of  the  predictive 
analysis  technique. 

An  experimental  program  now  in  operation  for  the  syntactic  analysis 
of  Russian  sentences  is  described  in  detail  in  Chapter  The  aspects  of 
Russian  grammar  which  have  been  coded  in  the  experimental  predictive 
syntactic  analysis  program  are  discussed,  and  examples  are  given  of  both 
successful  and  unsuccessful  attempts  at  analysis. 

One  implication  of  the  model  is  that  a  single  pass  of  a  sentence 
through  a  predictive  analysis  program  does  not  yield  a  successful  syntactic 
analysis  in  all  cases.  It  will  be  necessary  to  provide  for  suppleraentaiy 
passes  to  correct  errors  discovered  in  the  initial  pass.  Ifeny  of  the 
errors  are  easily  detected  and  a  scheme  for  the  systematic  correction  of 
the  errors  on  subsequent  passes  seems  promising. 
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When  discussing  a  subject  such  as  syntactic  analysis,  it  is  important 

'  to  distinguish  among  the  use,  mention,  and  representation  of  a  word. 

Conventionally,  a  word  is  used  to  specify  a  distinct  object,  a  certain 

action,  etc.  But  when  the  word  itself  is  the  subject  of  discussion,  its 

mention  facilitates  the  treatment  of  the  word  as  an  abstract  entity,  while 

the  representation  of  a  word  permits  the  Individual  characters  to  be 

considered  as  separate  entities.  The  problem  of  distinguishing  the  use, 

mention,  and  representation  of  signs  is  illustrated  by  the  following 

2 

examples  utilizing  Oettinger's  convention. 

Boston  is  a  city.  (Use) 

Boston  is  an  English  word.  (Mention) 

'♦Boston'*  is  the  conventionally  spelled  representation  of  Boston' 

The  asterisk  is  added  to  the  underscore  to  denote  mention,  as  distinct  from 
an  underscore  used  alone  merely  for  emphasis. 

This  notation  will  be  used  only  when  required  for  the  sake  of 
clarity.  In  Chapters  2  and  3,  in  particular,  it  will  be  used  liberally. 
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CHAPTER  2 

AN  IDEALIZED  CANONICAL  STEM  DICTIONARY 

1.  Introduction 

In  the  field  of  data  processing  in  general,  the  description  of 
con5)lex  systems  presents  difficult  problems.  In  particular,  it  has 
proved  difficult  to  describe  with  sufficient  detail  and  accuracy  the 
operation  of  nonnumerical  systems.  Numerical  work  can  be  set  forth  in 
mathematical  notation,  so  that  it  is  not  necessary  to  rely  on  detailed 
programs  to  describe  the  procedures  involved.  Nonnumerical  problems 
such  as  automatic  translation  have  similar  details,  but  there  is  no 
universal  notation  for  the  processes  involved  or  the  entities  to  be 
manipulated.  In  general,  the  procedure  has  been  either  to  outline 
processes  with  flowcharts  of  increasing  complexity,  or  to  give  all  the 
details  with  the  operating  program  itself.  However,  such  a  complete 
description  makes  the  process  unintelligible.  In  particular,  this  has 
been  the  case  with  automatic  translation  where  it  is  extremely  difficult 
to  design,  conqDi'ehend,  and  evaluate  such  systems. 

Recently  Iverson  has  devised  a  trochnique  of  notation  that  shows 
some  promise  of  coping  >7ith  the  descriptive  problems  (Appendix  A) .  One  of 
its  striking  merits  is  that  it  is  independent  of  the  characteristics  of 
specific  computing  machines,  and  once  mastered  is  of  sufficient  generality 
to  describe  a  variety  of  processes.  It  seems  desirable  to  formulate  a 
general  process  of  dictionary  compilation  and  operation  in  terms  of  this 


notation. 
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In  this  chapter,  an  idealized  canonical  dictionary  system  is  presented 
for  the  purpose  of  outlining  the  essential  features  of  any  such  systeiUt 
Besides  putting  into  perspective  the  essential  lexicographic  problems  of 
translation,  this  exposition  provides  a  frame  of  reference  against  which  the 
Harvard  Automatic  Dictionary  and  other  automatic  dictionaries  can  bo  compared. 

A  number  of  basic  terms  are  considered  in  the  following  paragraphs. 

A  canonical  dictionary  is  one  in  which  the  canonical  form  of  a  word,  such 
as  the  nominative  singular  of  a  noun  or  the  infinitive  of  a  verb,  is  used 
as  the  basic  source  of  the  dictionary  entry  (or  entries)  necessary  to 
represent  all  the  possible  inflected  forms  of  the  word.  In  contrast,  a 
dictionary  in  which  the  entries  are  directly  generated  from  text  occurrences 
would  not  be  a  canonical  dictionary,  since  any  form  of  a  word  could  occur 
in  a  text.  Different  types  of  canonical  dictionaries  are  possible.  For 
example,  the  ordinary  dictionary  in  which  the  canonical  form  itself  is 
listed  is  a  canonical  dictionaiy.  A  second  type  is  a  canonical  stem 
dictionary  which  lists  only  the  stems  of  the  inflected  forms,  which  in 
turn  are  obtained  from  a  canonical  form.  A  canonicel  stem  dictionary,  to 
which  all  further  discussion  in  this  chapter  will  be  restricted,  is  useful 
only  Insofar  as  the  number  of  dictionary  entries  per  word,  which  averages 
about  ten  in  a  Russian  full  paradigm  dictionary  (that  is,  a  dictionary 
containing  every  distinct  inflected  form  of  a  word),  can  be  minimized. 

The  grammatical  attributes  of  a  word  can  be  divided  into  both 
lexical  attributes  and  syntactic  attributes;  the  former  are  determined  by 
examining  individual  words,  while  the  latter  can  be  determined  only  by 
examining  the  words  in  context.  In  a  highly  inflected  language  such  as 
Russian,  many  of  the  grammatical  attributes  are  lexical,  while  in  a 
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relatively  'ininflected  language  like  English,  few  of  the  grammatical 
attributes  are  lexical  and  a  correspondingly  greater  nuilber  are  syntactic. 

For  example,  the  fiussian  noun  CToma  has  lexical  attributes  of  case, 
number,  and  gender,  such  that  the  noun  is  genitive,  singular,  and 
masculine.  The  English  noun  table  from  the  equivalent  of  the  table 
has  only  the  lexical  attributes  of  number  and  gender.  The  genitive  case 
can  be  determined  only  by  examining  the  context  in  which  table  is  found. 

In  the  Russian  language,  the  lexical  attributes,  which  are  a 
desired  output  of  a  dictionary,  are  determined  by  a  set  of  letter 
combinations  called  desinences  which  occur  at  the  ends  of  words.  The 
desinences  cannot  be  factored  systematically  as,  for  example,  in  the  two 
forms  “stom"  and  "axoMOu!',  In  the  former  form  the  "cm"  is  part  of  the 
stem,  while  in  the  latter  form  the  rightmost  "oji'  is  the  desinence.  It 
is  possible  to  define  an  arbitrary  sot  of  letter  combinations,  which  will 
be  called  affixes,  that  closely  parallel  the  set  of  desinences,  so  that  if 
a  word  is  considered  as  a  string  of  letters,  then  the  affixes  can  be 
factored  systematically  from  the  end  of  the  string.  The  stem  is  the 
string  of  letters  which  remains  after  the  affix  has  been  removed. 

The  rest  of  this  chapter  is  devoted  to  a  discussion  of  the  problems 
of  compilation  and  operation  of  a  canonical  stem  dictionary.  The  problems 
have  been  divided  into  three  general  areas; 

(1)  Since  it  is  the  set  of  affixes  that  is  factored  in  the 
operation  of  a  stem  dictionary,  the  lexical  attributes  which  are 
associated  with  the  desinences  must  be  associated  with  the  affixes.  In 
Sec.  2  a  scheme  is  developed  for  determi.ning  the  mapping  of  the  desinences 
onto  the  affixes  in  order  to  associate  the  lexical  attributes  with  the  affixes. 
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(2)  Frequently,  the  stems  of  two  different  words  are  identical 
although  the  set  of  affixes  associated  with  the  individual  stems  do  not 
intersect  at  all.  A  teclmique  by  which  a  list  of  the  affixes  associated 
with  a  given  stem  could  be  stored  in  the  dictionary  entry  would  reduce  the 
nonessential  ambiguity  in  the  dictionary  file  (Sec.  3).  Dictionary  look-up 
would  be  simplified  if  this  technique  also  provided  for  the  storage  of  the 
lexical  attributes  that  are  associated  with  the  affixes  (as  discussed  in 

the  previous  paragraph) . 

(3)  The  look-up  process,  which  has  to  be  repeated  for  every 
word  looked  up  in  the  dictionary  file  (Sec.  4)>  should  be  as  simple  as 
possible.  A  list  of  the  lexical  atti’lbutes  of  the  text  word  should  be 
included  in  the  output  of  the  process. 

2.  Reference  Matrices 

It  is  convenient  to  list  the  lexical  properties,  such  as  case  and 
number  for  nouns,  relevant  to  the  operation  of  an  automatic  dictionary  before 
preparing  a  procedure  for  compilation  or  look-up.  Since  in  a  Russian  stem 
dictionary  the  affix  of  a  word  is  used  to  determine  the  lexical  properties 
of  the  word,  the  list  should  consist  of  all  the  possible  affixes  and  all  the 
possible  lexical  attributes.  A  reference  matrix  is  such  a  list  (Fig.  2-1). 
One  or  more  reference  matrices  can  be  used  for  an  automatic  stem  dictionary 
of  a  given  language,  depending  on  the  number  of  attributes  and  the 
separability  of  any  of  the  sets  of  attributes  into  disjoint  classes. 

For  a  Russian  stem  dictionary,  three  productive  morphological  types 
tjj^  (noun,  adjective,  and  verb)  have  been  chosen,  and  a  reference  matrix  has 
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Ns 

As 

As 

Ip 

Ip 

Ip 

Pp 

Pp 

flX 

§ 

a 

awjr. 

bIXM 

ax 

Ns 

Ns 

a 

Reference  Matrix'  for  Noun  Morphological  Type 
Fig.  2-1 

been  associated  with  eacli  of  these  types.  Although  as  many  reference 
matrices  as  desired  may  be  chosen,  a  desire  for  simplicity  dictates  a 
search  for  a  natural  decomposition  of  the  set  of  Russian  words  into 
several  sets  of  morphological  types  of  words,  in  order  to  avoid 
encountering  unnecessary  complications,  several  of  which  will  be  illustrated 
later.  Similarly,  any  arbitrary  set  of  affixes  may  be  used,  although  the 
closer  the  set  of  affixes  parallels  the  set  of  desinences,  again,  the  fewer 
unnecessary  complications  will  be  encountered. 

A  vector  of  lexical  attributes  associated  with  a  morphological 
type  tjjj  is  defined;  for  example,  if  t^^^  is  a  noun  type:  . .  .,A^  j, 

where  each  component  of  X^  is  a  unique  lexical  attribute  such  as  nominative 
singular  or  genitive  plural.  The  symbols  t^^  and  X^  as  well  as  the  symbols 
that  will  be  Introduced  in  succeeding  paragraphs  are  summarized  in  Table  2-1. 

A  vector,  each  of  whose  components  is  one  of  the  Russian  desinences, 
and  which  includes  every  desinence  once  and  only  once,  is  designated  B* 
Likewise,  a  vector,  each  of  whose  components  is  a  Russian  affix  factored 
from  a  string  of  letters  by  an  arbitrary  algorithm,  and  which  includes 
every  affix  once  and  only  once,  is  designated  a.  The  order  of  the 
components  in  the  vectors  B  and  a  is  immaterial.  The  vector  represents 

the  lexical  attributes  (there  may  be  more  than  one)  in  type  t^^^  of  an  affix 


Symbol 

Function 

8 

Desinence  vector 

a 

Affix  vector 

Morphological  tJTPo 

Lexical  attributes  of  morphological  type  t^^^ 

Lexical  attributes  of  desinence  or  affix  x 

V 

Reference  matrix 

-  D.exical  attribute 

vj  -  affix 

V* 

Auxiliary  reference  matrix 

*1 

V  ^  -  lexical  attribute 

*2 

V  ^  -  desinence 

F(x) 

Arbitrary  factoring  operation  on  word  x 

n(x) 

Paradigm  representation  of  word  x 

4 

Entry  function  vector 

a 

w 

Affix  of  word  w 

Lexical  attributes  of  word  w 

ifjfkf 

Indices 

A 

Null  formula 

Definition  of  Symbols 
TABLE  2-1 
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or  desinence  x,  thus,  where  "om"  and  '*a"  are  desinences,  (”oi^)  = 

[instrumental  singular],  while  ("a”)  =*  [nominative  slngalar, 

genitive  singular,  accusative  singular,  nominative  plural,  accusative 

All  the  information  known  atout  a  morphological  type  prior  to  the 
construction  of  a  reference  matrix  can  be  summarized  in  the  list  of 
lexical  attributes,  the  list  of  all  possible  affixes,  the  list  of  all 
possible  desinences,  and  the  set  of  vectors  .  However,  since  affixes 

and  -'t  desinences  are  factored  from  words,  a  condensed  representation  of 
the  set  of  vectors  is  needed.  The  rest  of  this  section  is  devoted 

to  the  problem  of  obtaining  this  condensed  representation. 

For  each  lexical  attribute  of  a  given  morphological  type  a  two-row 
submatrix  is  constructed  such  that  the  components  in  the  second  row 
represent  affixes  that  can  signify  the  lexical  attribute.  Each  component 
in  the  first  rov/  represents  the  lexical  attribute  itself  (Fig.  2-2) . 


Pp 

Pp 

ax 

flX 

Submatrix  of  the  Lexical  Attribute  Prepositional 
Plural  of  the  Noun  Morphological  Type 

Fig.  2-2 

The  submatrices  of  all  the  lexical  attributes  are  then  joined  to  form  the 
reference  matrix  (Fig.  2-1) .  The  ordering  of  the  submatrices  must  coincide 
with  the  ordering  of  the  lexical  attributes  in  X  .  but  the  ordering  of  the 
columns  within  each  submatrix  is  immaterial.  Each  affix  can  occur  no  more 
than  once  in  a  submatrix,  but  can  be  repeated  in  any  number  of  submatrices. 
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The  operations  necessai'y  to  construct  a  I’eference  matrix  from  the  affix 
vector,  the  desinence  vector,  the  lexical  attribute  vector  end  the  set 
of  vectors  are  shown  in  Program  2-1  and  explained  in  the  following 

paragraphs. 

Prior  to  constructing  the  reference  matrix  V,  it  is  convenient  to 

*K‘ 

construct  an  auxiliary  matrix  V  that  resembles  the  reference  matrix  in  form 
but  whose  second  row  is  a  row  of  desinences  instead  of  affixes.  The  matrix 
V  is  set  to  null  in  step  1. 

To  iterate  over  all  the  lexical  attributes  in  X  ,  an  index  i  is 

— m 

initialized  in  step  2  and  decremented  in  step  3*  A  minor  loop  for  each 

desinence  is  initialized  in  step  4*  Step  5  sets  the  logical  vector  jo  to  null^ 

and  step  6  decrements  the  index  j  of  the  minor  loop. 

In  step  7,  the  component  X  of  X  is  treated  as  a  vector  of  one 

°i  . 

component.'  This  component  is  mapped  onto  the  lexical  attributes  of 

the  component  8^  of  8.  Since  X^  has  but  one  component,  the  resultant  of  the 
mapping  is  an  integer.  The  logical  reduction  substitutes  a  "1®  in  the  place 
of  any  integer  other  than  "0",  and  the  "1"  or  the  "0"  is  left-adjoined  to  jj. 
This  iterative  process  is  repeated  until  every  component  of  8  has  been  scanned. 
For  example,  if  X^^  =  |^nom.  sing.j  and  8^  =  "a",  then  i^(8^)  = 

nom.  sing.,  gen.  sing.,  accus.  sing.,  nom.  pli«*.,  accus.  plur.j, 

/j.u^(8^)  < —  X^  ]=  1,  and  since  1  0,  a  "1"  is  left-adjoined  to  jp.  If 

'^Throughout  this  and  succeeding  programs,  a  string  of  characters  will  be 
considered  both  as  a  one-component  vector  and  as  a  vector  with  each 
character  of  the  string  a  component  of  a  vector.  Thus,  u  =  [“green'®]  is 
a  one-component  vector,  but  y  =  [“g“,  "r“,  "e“,  "e",  "n“J  is  a  five- component 
vector.  It  is  also  possible  that  the  entire  string  will  be  but  one  component 
of  a  vector  in  ano-ther  contextj  for  example,  the  three- component  vector  W  = 
[•'ohe'*,  “green®,  “leaf®]. 
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X  =  nom.  plur.  for  the  same  8.,  then  u  u  (8,)-< — =  4>  but  ainoe 
L  J  y  ”L-m  j  J 

4  0,  a  '*1"  would  still  be  left-adjoined  to  _g.  If  8^  =  ”ou",  then 

Uj^(8j)  =  [instr.  aing.j  ,  /i|^u^(Sj) *«—  X^  j  =0,  and  a  "0"  is  left-adjoined 


The  resultant  logical  vector  _p  of  this  iterative  loop  is  of  the 
dimension  of  8  and  has  a  "1"  in  each  location  corresponding  to  the 
components  of  8  which  could  have  the  lexical  attribute  X^  in  tj^^* 

In  step  8,  8  is  compressed  by  The  compressed  subvector  of  8  is 
left-adjoined  to  the  second  row  of  V  .  The  components  of  the  subvector  are 
the  desinences  that  have  the  lexical  attributes  X^  in  t^,  for  instance; 
the  subvector  for  the  lexical  attribute  prepositional  plural  in  the  noun 
type  would  be  |^ax,f£X  (Fig.  2-2),  A  vector,  each  of  whose  components  is 

X  ,  and  of  the  dimension  of  the  desinence  subvector,  is  left-adjoined  to 

m . 

the  first  row  of  V  in  stop  9* 

This  entire  process  is  repeated  until  a  submatrix  has  been  adjoined 
to  V*  for  every  lexical  attribute  in  ^  In  the  next  sequence  of  steps, 
each  desinence  (column)  in  V  -is  replaced  by  a  submatr.ix  of  affixes  in  V. 
Any  of  these  affixes  might  be  factored  from  a  string  of  letters  ending  in 
the  desinence j  for  example,  the  affixes  "y",  "et/iy",  or  "owy"  might  be 
factored  from  a  string  of  letters  ending  in  the  desinence  "y".  The 
arbitrary  factoring  operation  used  in  this  process  is  designated  F(x),  where 
X  is  the  string  of  characters  being  factored,  and  a  logical  tail  vector  £ 
is  defined  as  the  result  of  the  operation  F  on  the  string  x,  £  =  F(x) .  The 


^The  factoring  of  the  string  "ei^y"  or  "o^y"  is  an  example  of  false  factoring 
if  the  desinence  is  in  fact  “y“  (Sec.  3 *30). 


2-11 


weight  of  the  logical  tail  vector  ^  is  equal  to  the  dimension  of  the 
affix  factored  by  and  the  dimension  of  q  is  equal  to  the  dimension  of 
the  original  string  of  letters  x. 

The  reference  matrix  is  sot  to  null  in  step  10.  An  iterative 
process  that  will  operate  on  each  column  of  V  is  Initialized  in  step  11 
and  the  index  k  is  decremented  in  step  12. 

A  minor  loop  to  scan  a  for  each  I  is  initialized  in  step  13. 
The  vector  g  is  set  to  iJi  step  34  sod  the  index  I  is  decremented  in 
step  15. 


In  step  16,  the  affix  in  a  is  considered  a  vector  with  the 

individual  letters  of  the  affix  as  components  of  the  vector.  This  vector 

is  compressed  by  a  logical  tail  vector  whose  weight  is  equal  to  the 

dimension  of  the  desinence  v  which  is  also  considered  a  vector  with 

letters  as  canponents  in  this  process.  This  step  ensures  compatibility 

^2  '^2 

between  the  elements  of  g  and  V  ^  in  the  next  step.  Thus,  if  V  ^  =  "y“ 


and  if  =  “evy",  then  =  1,  t*^^  =  [o...Olj,  and 

/  *2\ 

t  ^'0/  ”  "y'**  1*^6  element  adjoined  to  ^  has  the  same  dimension  as 
*2 

Y  1  .  It  should  be  noted  that  by  the  definition  of  a  logical  tail  vector, 

if  o'(t)  then  a /remains  unchanged,  as  in  the  case  If  Y  f 

and  a/  =  V  ^here  t*"^^  =  "y". 

In  step  17,  the  components  of  g  are  logically  reduced  by  a  vector 

*2 

each  of  whose  components  is  V  The  resultant  logical  vector  is  used  to 
compress  a.  The  compi’essed  vector  j  contains,  as  components,  aH  the 
affixes  that  might  be  factored  by  the  arbitrary  factoring  algorithm 


operating  on  a  sti'ing  of  letters  ending  in  the  desinence  Y 


*2 


k’ 
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If  the  dimension  of  j  is  zero  (step  18),  then  no  affix  that  is  at 

*2 

least  as  long  as  the  desinence  represented  by  V  exists;  and  the  desinence 

can  be  replaced  only  by  an  affix  shorter  than  the  desinence •  The  desinence 

is  factored  by  F  in  step  19  and  the  resulting  affix  is  substituted  for  the 

desinence  in  V  after  which  the  process  returns  to  step  13*  This  path  can 

be  followed  only  once  per  V  since  0  in  step  18  once  the  affix  has 

been  substituted  for  the  desinence. 

If  the  dimension  of  ^  is  not  zero  at  step  18,  then  one  or  more  affix, 

that  might  be  factored  by  the  algorithm  when  operating  on  a  word  ending  in 

the  desinence,  has  been  found.  The  program  transfers  to  step  20  and  j  is 
2 

left-adjoined  to  V  .  In  step  21,  a  vector,  each  of  whose  components  is 

*1  1 
V  and  of  the  same  dimension  as  is  left-adjoined  to  V  . 

* 

The  process  of  steps  12  to  21  is  repeated  for  every  desinence  in  V 
until  the  reference  matrix  V  has  been  completely  generated. 

As  an  illustration  of  the  entire  process  of  producing  a  reference 
matrix,  a  greatly  simplified  morphological  type  with  only  tliree  lexical 
attributes,  dative  singular  (Ds),  prepositional  singular  (Ps),  and 
instrumental  plural  (Ip),  will  be  considered.  The  range  of  desinences 
corresponding  to  these  morphological  types  will  also  be  limited. 

The  step-by-step  process  is  outlined  in  detail  in  Table  2-2  based  on 
the  following  set  of  definitions; 


8  =  aw ,  am  ^  e  ,  nj , 


a  = 


^  f  Q  f  f  VI 


Step 

2 

4 


Other 

Conditions 


Result 


7 

7 

7 

7 

8 

9 

7 

9 

7 


i  =  3,  j  =  4 

i=  4 

j  =  5 

([PsJ<~[lp])  =  0,  J9=  [0] 

i  =  3,  j  =  3 

([Ds,Pe]<-  [ip])  =  0,  £  =  [0,0] 

i  =  3,  j  =  2 

([lp]<-[lpl)  =  1,  £=  [i,o,o] 

i  =  3,  j  =  1 

(A<— [ipj)  =  0,  f  =  [0,1, 0,0] 

i  =  3 

V*^  =  [bmh] 

i  =>  2,  j  =  1 

=■  [ip] 

£  =  [0,0, 1,1] 

1  =  2 

f  =  [^e  ^H  aj^n] 

i  =  1,  j  =  1 

£=  [0,0,1, o] 

9 

U 

13 

16 

17 

18 
16 

18 


i  =  1 


k  =  4,  /  =  1 
k  =  4 

k  =  3,  /  =  1 
k  =  3 


Ds  Ps  Ps 
e  e  H 


Ip  ■ 

awM 


k=  5 
/  s  6 


P  =  [jaM,  B3M,  e,  jie, 

(V*J  =  g)  =  [0,1, 0,0,0],  =[»M 

=  [Ip] 

p  =  [m,  m,  e,  e,  m] 

(/k  =  =  [o,i,o,o,i] 

r  Ps  Ps  Ip  1 

X  -  [aww  M  aMMj 


16 

IS 


k  =  2,  /  =  1 
k  =  2 


£  =  [m,  h,  e,  e,  m] 

(fk"  p)  =  [o,o,i,i,o] 


Ps  Ps  Ps  Ps 
e  He  SMM  H 


Ip 

awH 
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k  = 


„  _  [Ds  Ds  Ps  Ps 
1  ~  [e  He  e  ne 


Ps  Ps  Ip  1 
awH  H  aMH  J 


Details  in  the  Process  of  Producing  a  Reference  Matrix 

TABLE  2-2 


and  for  each  desinence, 


\(«m)  = 

u^(aMH)  =«  [ip], 

i^(«)  =  [ds,  Ps], 

ii^(k)  =  [ps] . 

3«  Dictionary  Compilation 

Every  stem  of  an  inflected  word  is  stored  as  a  separate  dictionary 
entry.  Every  dictionary  entry  will  contain  a  list  of  the  set  of  affixes 
that  can  occur  with  the  stem  and  the  lexical  attributes  associated  with  each 
affix.  This  Information  will  be  represented  by  a  logical  vector,  the  entry 
function  vector  y.  such  that  ^(y)  =  ‘'(V).  Every  "1"  in  the  entry  function 
vector  will  correspond  to  the  affix  -  lexical  attribute  pair  of  the 
corresponding  column  of  the  reference  matrix.  For  instance,  if  a  stem  in 
the  morphological  class  of  Table  2~2  had  the  affix  "e"  in  the  dative  singular, 
"h”  in  tlie  prepositional  singular,  and  in  the  instrumental  plural,  then 

the  entry  function  vector  of  that  stem  y  =  [l,0,0,0,0,l,lj. 

The  compilation  of  a  set  of  entry  function  vectors  yj.  (there  are  k  , 
stems  in  the  paradigm  of  a  word)  will  now  be  considered.  The  reference 
matrix,  the  paradigm  of  tlie  word,  and  an  arbitrary  factoriiig  algorithm  are 
neceasary  initially  for  the  compilation. 

The  paradigm  representation  of  an  inflected  word  w,  belonging  to 
class  c^  in  morphological  type  t^^^  is  denoted  by  the  matrix  11  where  i'(n  )  “  2 
(Fig.  2-3).  All  the  relevant  lexical  attributes  are  listed  in  the  first 
column  and  all  the  members  of  the  paradigm  of  the  word  are  listed  in  Hie 


second  column  of  the  matrix.  Each  row  consists  of  a  single  morabor  of  the 
paradigm  and  its  associated  lexical  attribute. 

The  process  for  obtaining  the  entry  function  vectors  for  a 
paradigm  is  described  in  Program  2-2. 

Ns  aTOM 

Gs  atoua 

As  a70U 

Ds  aroj^ 

Is  aTOJiOM 

Ps  aroMe 

Np  aTOMH 

Gp  aTOMOB 

Ap  aToiffij 

Dp  arouau 

Ip  aTouaMH 

Pp  aTowax 

Paradigm  Representation  Matrix  for  axoM 
Fig.  2-3 

Step  1  defines  the  paradigm  representation,  ]}  ,  according  to  the 
rules  of  the  class  c^  to  which  the  word  w  belongs.  In  step  2,  the 
arbitrary  factoring  algorithm  F  is  applied  to  the  second  component  of 
each  row  of  0,  each  of  these  components  being  considered  a  vector.  The 
resulting  logical  tail  vector  is  the  corresponding  component  of  the  column 
vector  Figure  2-4  sho7/s  w  for  the  paradigm  representation  of  Fig.  2-3^ 
using  an  arbitrary  algorithm  that  factors  the  affixes?  "om",  "a",  "oji^y”, 
’•e”,  ”h",  "OB”,  "aw*',  and  "ax”,  among  otb.ers. 

The  second  component  of  each  row  of  O  is  compressed  by  the  inverse 
of  the  corresponding  component  of  and  the  resultant  components  of  jl  are 
components  of  the  vectors  in  step  3.  Each  component  is  a  stem  ft*om  the 
paradigm  representation  D  j  thus,  using  the  same  example, 

V  =  [ax,  axoM,  ax,  ax^  axoM,  axoM,  axoM,  axoM,  axoM,  axoM,  axoM,  axoM  . 


_n  (axoM  )  = 
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ProgTam  for  Constructing  Entry  Function  Vectors 
for  an  Inflected  Word 

Program  2-2 


liJ 


F  (  n  aroM  ) 


0011 

00001 

0011 

ooin 

000011 

00001 

00001 

000011 

00001 

000011 

0000111 

000011 


Logical  Column  Vector  Resulting  from  the  Factoring 
of  the  Paradigm  Representation  of  aTOi# 

Fig .  2-4 
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The  vector  t  is  mapped  onto  itself  in  step  4>  resulting  in  a 
permutation  vector  whioh^in  turn,  is  compai’ed  with  the  identity 

’K' 

permutation  vector  j/;  in  the  case  of  axoM  ,  the  permutation  vector 

n  =  (1,2, 1,1, 2, 2, 2, 2, 2, 2, 2, 2)  is  obtained.  The  resultant  logical  vector 

is  then  used  to  compress  a.  The  consequence  of  tills  operation  is  to 

determine  the  vector/^  derived  from  a  by  suppressing  repeated  components, 

* 

such  that  each  distinct  stem  of  ^  is  a  component  of«T,  thus  for  aroM  . 

£  =  ar,  axoM  . 

The  index  k  is  initialized  in  step  5  and  decremented  in  step  6 
for  the  iterative  process  that  will  create  k  dictionary  entries  from  the 
paradigm  representation  of  w. 


In  step  7,  the  vector  v/hioh  will  be  the  entry  function  vector 

for  the  stem  p^,  is  set  to  all  zeros.  The  dimension  of  is  the  same 

as  the  row  dimension  of  the  reference  matrix  for  the  type  t  of  the  word 

m 

under  consideration. 


In  step  8,  the  components  of  a  are  logically  reduced  by  a  vector, 
each  component  of  which  is  p,  .  The  columns  of  h!.  are  compressed  by  the 
resultant  logical  vector,  each  remaining  row  of  becoming  a  column  of^. 
The  res'oltant  subparadigm  representation  ^  of  w  contains  all  the  inflected 
forms  of  the  paradigm  representation  that  result  in  the  stem  after  being 
factored  by  the  arbitrary  algorithm. 

Once  more  considering  the  paradigm  representation  of  aroM  y  for 
the  stem  "ax", 

!  Ns  As  Ds  ' 

(ax)  =1,  axoM  axoM  axoMyj, 
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while  for  the  stem  "aTOM», 

Gs’  Is  Ps  Np  Gp  Ap  Dp  Ip  Pp  ■ 
^  (aToii)  =  [  aTowa  btomom  aroue  arouH  btomob  aTOwu  aTOwaxi  aTOMawM  arouax. 


In  step  9j  the  index  j  is  initialized  and  then  decremented  in  step  10 
for  an  iterative  process  on  every  component  of  the  second  row  of  the  subparadigm 
representation  matrix  ^  . 


In  step  11,  the  arbitrary  factoring  algorithm  operates  on  the 
2 

inflected  form  which  is  regarded  as  a  vector  with  letters  as  components. 

The  resultant  logical  vector  then  is  used  to  compress  resulting  in  the 

replacement  of  the  inflected  form  by  its  affix.  This  process  is  repeated  for 

every  inflected  form  in  ^  ,  such  that,  in  the  example, 

fNs  As  Ds  ■ 

Y  (aT)  =  [OM  OM  ouyj^ 


In  step  12,  every  column  of  the  reference  matrix  V  is  mapped  onto 
the  columns  of  the  subparadigm  representation  matrix  ^  .  Thus  for  every 
column  in  V  that  also  exists  in  there  will  be  a  "1"  in  the  corresponding 
element  of  the  logical  vector 

A  technique  for  storing  a  mixed  canonical  stem  and  full  paradigm 
dictionary  is  suggested  by  the  entry  function  vector.  If,  for  a  given  word, 
a  mark  were  entered  in  some  extra  register  to  Indicate  that  the  word  is  to 
be  stored  as  a  full  paradigm,  then  the  step  —  tVO^  could  be  substituted 
for  steps  2  and  3  of  Program  2-2  and  an  entry  for  every  distinct  inflected 
form  of  a  paradigm  would  be  generated.  With  this  technique,  the  dictionary 
look-up  process  which  will  be  described  in  the  next  section  would  not  have 
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to  be  altered  at  all  to  look  up  irorda  in  the  mixed  stem  and  full  paradigm 
dictionary. 

4.  Analysis  of  Inflected  Words 

An  entry  in  the  idealized  dictionary  can  be  looked  up  by  using  the 
stem  of  the  word  as  the  key.  Once  a  dictionary  entry  has  been  found,  it 
is  necessary  to  determine  whether  the  affix  factored  from  the  text  word 
can  occur  legitimately  with  the  stem  of  the  dictionary  entry.  If  so,  the 
lexical  attributes  (there  may  be  more  than  one)  of  that  affix  are 
displayed  in  a  condensed  logical  vector,  the  reduced  lexical  attolbute 
vector,  of  the  dimension  of  X^. 

Once  a  dictionary  entry  has  been  found,  it  is  necessary  only  to 
compare  the  affix  of  the  text  word  with  the  list  of  all  possible  affixes 
that  is  stored  in  the  form  of  the  entry  function  vector  (Program  2-3) •  If 
the  affix  of  the  text  word  corresponds  with  one  or  more  affixes  on  the  list, 
the  corresponding  lexical  attributes  are  displayed. 


a-(r)  :  0 

K  *-  {  i„,)]  c} 


<—  (^Inc^patible 
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The  whole  second  row  of  the  reference  matrix  of  morphological 
type  tjjj  is  logically  reduced  in  step  1  by  a  vector  each  component  of  which 
is  the  affix  a^.  The  resulting  logical  vector  is  intersected  with  the 
entry  function  vector  which  is  stored  in  the  dictionary  entry.  If  the 
resultant  logical  vector  has  only  zero  components,  the  dictionary  entry  is 
incompatible  with  the  text  word,  that  is,  the  word  represented  by  the 
dictionary  entry  is  not  the  same  word  as  the  word  encountered  in  the  text. 

If  the  dictionary  entry  is  compatible ,  the  first  row  of  the  reference 

matrix  is  compressed  by  the  logical  vector  r  in  step  2.  The  lexical 

attribute  vector  is  then  mapped  onto  the  compressed  row  of  V.  The 

resultant  logical  vector  is  the  reduced  lexical  attribute  vector,  X  . 

~w 

As  an  example,  the  simplified  reference  matrix  of  Sec.  2  and  the 
subsequent  entry  function  vector  of  Sec.  3  will  be  used: 


fDs  Ds  Ps  Ps  Ps  Pb  Ip  1 
y  =  [e  MS  e  Me  auM  m  aunj. 


-  [l, 0,0, 0,0, 1,1  . 


If  the  affix  "Me**  is  factored  from  a  text  word  with  a  stem  that  results  in 
tlie  look-up  of  the  entry  with  the  entry  function  vector  then  the  logical 
reduction  (a^  “ “  [o?i>Oji>0,0,oJ  and  the  intersection  will  result  in 
y  =  [o, 0,0, 0,0, 0,0  and  the  subsequent  interpretation  that  the  affix  is 
incompatible  with  the  stem  of  that  dictionary  entry.  However,  if  the  affix 
"awM"  is  factored,  then  (a,^  =  V^)  =  [o, 0,0,0, 1,0, l],  r  =  [  0,0, 0,0 ,0,0, l], 
after  which  =  Ip  ,  fj-  =  [o,0,l  ,  =  |^0,0,lj,  and  the  stem 

of  the  dictionary  entry  is  compatible  with  the  affix  "aMH". 
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5 .  Summary 

The  three  programs  described  in  this  chapter  constitute  necessary 
steps  for  the  compilation  and  operation  of  an  idealized  stem  dictionary. 
Keeping  in  mind  some  constraints  of  practical  data  processing,  the  most 
complex  Set  of  instructions  has  been  used  for  the  creation  of  the 
reference  matrices,  a  task  that  has  to  be  performed  relatively  few  times, 
while  the  simplest  set  of  instructions  has  been  used  in  the  operation  of 
the  dictionary,  the  task  that  has  to  be  performed  most  frequently. 

Although  the  dictionary  described  above  is  idealized,  it  is 
highly  impractical.  The  necessary  operations  for  dictionary  compilation 
and  for  the  analysis  of  dictionary  entries  are  well  defined,  but  too 
many  machine  words  are  necessary  to  store  the  reference  matrices  and  the 
entry  function  vectors  even  on  a  binary  machine  where  each  bit  is 
individually  accessible.  Many  more  than  100  bits  are  needed  for  each 
entry  function  vector,  since  a  desinence  often  has  more  than  one  lexical 
attribute,  each  of  which  is  represented  by  a  bit  in  V  ,  and  the  desinence 
is  often  replaced  by  several  affixes.  To  operate  a  practical  stem 
dictionary^ it  is  necessary  to  avoid  using  so  much  storage  for  each 
dictionary  entry.  In  the  next  chapter,  where  the  Harvard  Automatic 
Dictionary  will  be  described,  several  methods  for  reducing  the  storage 
requirements  will  be  pointed  out. 
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CHAPTER  3 

THE  HARVARD  AUTOMATIC  DICTIONARY  “  AN  OPERATING  CANONICAL  STEM  DICTIDNARr 

1.  Introduction 

An  automatic  dictionary  is  an  essential  component  of  an  automatic 
translator.  A  canonical  stem  dictionary,  the  Russian-to-English  Harvard 
Automatic  Dictionary,  has  been  put  into  operation  over  the  last  four  years 

1 

and  is  controlled  by  a  comprehensive  set  of  programs  and  routines.  Glullano 

2  3/ 

and  others  *  have  described  the  solutions  to  many  of  the  problems  related 
both  to  the  compilation  and  modification  of  the  dictionary  file  and  to  the 
look-up  of  words  in  the  dictionary.  The  solutions  to  the  remaining  problems 
of  word-by-word  analysis  are  considered  in  this  chapter. 

The  look-up  of  words^  is  effected  by  the  Continuous  Dictionary  Run, 
a  set  of  programs  which  are  executed  continuously  and  in  sequence  (Fig.  3-1). 
A  Russian  text  is  copied  onto  a  magnetic  tape  in  a  format  similar  to  the 
original  copy.  The  itemize  program  organizes  the  format  of  the  input  text, 
placing  each  text  word  into  an  item  of  standard  size.  The  affixes  are 
removed  from  the  Russian  words  by  the  "inverse  inflection  algorithm"  in  the 
split  program,  and  the  items  are  sorted  into  alphabetic  order  with  the 
remaining  stem  of  the  word  as  the  primary  key.  Each  stem  is  then  looked  up 
in  the  dictionary  and  a  complete  dictionary  entry  is  substituted  for  every 
word  in  the  text.  At  this  point,  each  word  is  analyzed  morphologic ^Qy  and 
the  syntactic  information  thus  obtained  is  inserted  into  the  dictionary  itei. 

'A"  Included  in  the  definition  of  a  text  word  or  a  word  of  a  sentence  will 
be  any  punctuation  mark,  mathematical  symbol,  abbreviation,  etc. 
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Continuous  Dictionary  Run 

Fig.  3"1 
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In  the  next  program  the  affixes  are  reattached  to  the  stems  and  the  words 

are  sorted  baolc  into  text  order.  Following  this,  "homographs"  are  deleted, 

and  in  the  last  program  the  Russian  words  are  transliterated  to  permit  their 

representation  by  Latin  letters.  The  output  of  the  Continuous  Dictionary 

Run  is  referred  to  as  the  "augmented  text".  The  programs  of  the  Continuous 

5  6 

Dictionary  Run  have  been  described  by  Jones.  * 

The  class if ioatlon  scheme  of  Russian  words  and  the  Inverse  inflection 
algorithm,  both  of  which  were  developed  more  than  three  years  ago,  are 
discussed  in  the  light  of  the  experience  gained  in  working  with  them  since 
that  time  (Sec.  2).  A  mapping  operation  to  correlate  the  classification 
scheme  with  the  inverse  inflection  algorithm  is  presented  in  Sec.  3* 

The  system  devised  to  interpret  false  factoring  (that  is,  the 
factoring  of  a  string  of  letters  different  from  the  expected  affix)  of 
dictionary  items  by  the  inverse  inflection  algorithm  (Seo.  4)  B.n&  the  system 
devised  to  analyze  the  affixes  of  text  words  given  their  dictionary  entries 
(Sen.  5)  are  described  in  detail.  An  example  of  the  output  of  the  coi’re- 
spending  programs  is  presented  in  Sec.  6. 

Some  statistics  on  the  reliability  of  the  Harvard  Automatic 
Dictionary  are  set  forth  in  Sec.  7,  while  additional  statistics  for  the 
efficient  operation  of  the  analyzing  programs  are  introduced  in  Sec.  8. 


2.  Word  Classification  and  the  Inverse  Inflection  Algorithm 

The  output  of  the  Continuous  Dictionary  Run  contains  basically  the 
same  grammatical  information  as  the  output  of  the  idealized  dictionary; 
however,  the  mode  of  operation  of  the  program  differs  greatly  from  that  of 
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the  idealized  eyatom.  The  vailous  routines  that  constitute  the  dictionary 
compilation  ayetem  and  the  Continuous  Dictionary  Run  were  written  over  a 
time  span  of  several  years.  In  the  more  recent  routinea  the  possibility 
of  using  new  symbols  and  formats  was  often  limited  by  those  already  adopted 
during  earlier  periods  of  research.  This  has  had  strong  effects  on  the  mod© 
of  operation  selected  in  these  newer  routines  and  has  imposed  many  apparently 
arbitrary  oonhtraints  on  the  actual  experimental  system. 

Some  of  the  earlier  phases  of  dictionary  research  are  discussed  with 
the  aim  of  describing  them  in  terms  of  the  5.doalized  dictionary  and  of 
pointing  out  changes  that  might  be  made  should  it  become  desirable  to 
reprogram  the  system. 

A.  Uorphologicol  Types  and  Their  Classification 

Before  the  description  of  the  morphological  types,  Oettinger's 

7 

definition  and  notation  for  paradigms,  which  will  be  used  in  this  chapter, 
is  given.  A  paradigm  of  a  word  is  the  full  set  of  inflected  forms  of  the 
word.  Usually  there  are  twelve  inflected  forms  In  noun  paradlgns  (Pig.  3-2). 
A  reduced  paradigm  of  a  word  is  the  set  of  distinct  representations 
(Fig.  3-3).  Examination  of  Fig.  3-2  and  Fig.  3-3  points  cut  that  there  is 
only  one  distinct  representation  "cry^;0HTa”  for  crygeroa^,  and  orygeHya^^  ^ 

Mid  one  distinot  representation  "cyyfleHroB”  for  crygeHroB^  arai  erygeHyoBy  . 

iip  up 

This  multiple  usage  of  distinct  representations  defines  internal  homography. 

A  detailed  description  of  the  different  types  or  homography  can  be  found  in 
Chap.  9  of  Ref.  7. 
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cryAOHr^ 

Ns 

cTyA«HTa* 

As 

cTyASHra* 
- Gs 

<yryAeHry*o 

* 

cTyfleHTOu, 
— - IS 

OTyfleHTS^^ 


oryASHTO^^ 

232a®S2B^ 


ciyAgHTai^ 

« 

oyyA»OTaMM, 


oryASHTax^ 


"oryAeirr" 

•'oryASHTH'' 

"cryASHTa" 

’’oTyAeHTOB" 

"cryAeHTy" 

®OTyA«i«'dh’* 

"CTyACHTOM" 

"oTyASHraiot" 

"CTTASHTe" 

’’cryAeOTax" 

Paradigm  of  oiroeHr 
Fig.  3-2 


Reduced  Paradigm  of  oryASHT 
Fig.  3-3 


In  the  Harvard  Automatic  Dictionary  inflected  words  have  bean 
arbitrarily  divided  into  three  morphological  types:  nouns,  adjectives, 
and  verbs.  Each  of  these  types  was  divided  into  a  number  of  morphological 
classes  by  Magassy.^*^  !Rie  morphological  classes  were  kept  as  few  in  number 
as  possible  to  ease  the  burden  of  assigning  new  words  to  these  classes  and 
to  simplify  the  programs  for  inflecting  these  classes  during  the  generation 
of  dictionary  entries.  Because  of  this  moi’phological  description,  it  is 
possible  to  find  some  nouns  such  as  nopripfi  belonging  to  an  adjectival 
morphological  class. 

In  the  classification  scheme  for  every  noun,  adjective,  and  verb 
class  two  important  types  of  information  ai'a  given  (Fig.  3-4) •  The  first 
is  a  morphological  description  of  the  words  that  belong  to  the  particular 
class,  stressing  the  behavior  of  the  word  "tails"  that  can  occur.  In  th© 
exaaple  shown,  the  class  N2  consists  of  all  nouns  ending  in  "a:K",  and 

"mS",  as  well  as  of  some  of  the  nouns  ending  in  "©:K",  Seeoidly,  for  each 
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1 — 

CLASS  !  EIAM?I£S 

GLASS  IDENTIFICATION 

N2 

CTpofi 

jonnaii 

reiwft 

Myae^ 

A  class  erabraoing: 

1.  the  nouns  ending  in  o,  a,  m) 

+  ii. 

2.  some  nouns  ending  in  «  3 

GENERATION  RULES 

Generating  Stem  (GS) 

Generated  Forms  with  Specified 
Generating  Affixes 

word  -  last  letter. 

a.  word  f.  GS  +  m 

b.  GS  «  g.  +  es 

c.  +  K)  h.  +  nu 

d.  +  ew  i,  + 

e.  +  e  j.  +  nx 

Definition  and  Description  of  Class  ii2 
Fig.  3-4 


class  there  is  a  generation  rule  specifying  both  how  the  generating  stem  is 
formed  (in  the  example,  the  word  less  the  last  letter)  and  which  generating 
affixes  can  be  right- ad  joined  to  the  generating  stem  to  form  the  members  of 
the  reduced  paradigm  of  the  word.  The  generating  stem  of  a  noun  in  class 
N2  can  end  in  "o",  "a",  "e",  or  "w",  and,  in  addition,  can  have  the  generating 
affixes  "F,  "F,  "m",  "sm”,  "e",  "m",  "es",  'W,  and  "hx"  adjoined. 

This  list  of  generating  affixes  completes  the  description  of  this  class  of 
noun. 

Three  types  of  word  endings  now  have  been  introduced  in  this  thesis. 

A  desinence  is  a  word  ending  that  has  lexical  significance  but  which  cannot 
be  identified  formally.  An  affix  is  a  word  ending  approximating  a  desinence 
that  is  factored  formally  from  a  word  by  an  appropriate  algorithm.  A 
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generating  affix  ie  an  artificial  word  ending  that  ifl'  used  to  cunetruot  the 
reduoed  paradigm  of  a  word  from  both  its  oanonloal  form  and  its  olaea  marker. 

All  the  generating  affixes  of  a  olaas  might  not  be  needed  to  define 
a  single  paradigm.  Several  related  paradigms  are  sometimes  fused  into  a 
single  olass,  or  cumulative  paradigm,  as  an  alternative  to  maintaining 
separate  olasses  for  dictionary  oompilationj  thus,  in  class  H4  the  instru¬ 
mental  singular  is  "flauoft*'  or  "yjnmeft",  but  not  "flaMeft”  or  "yjiKu;oii". 

Matejka^^'^  eliminated  (1)  some  ambiguities  that  had  been  deliber¬ 
ately  left  in  the  morphological  description  of  the  grammatical  functions, 
and  (2)  the  spurious  forms,  by  separating  the  noun  classes  into  finer 
subdivisions.  Every  noun  class  was  divided  into  animate  and  Inanimate 
categories,  and  these  groups  were  further  divided  into  as  many  as  four 
categories,  although  rarely  into  more  than  two. 

For  example,  if  nouns  of  class  Nl,  such  as  crygom*  (animate) 

(Figs.  3-2  and  3-3)  and  oroj  (inanimate)  (Figs. *3-5  and  3-6)  were  not 
subdivided  into  animate  and  inanimate  categories,  paradigmatic  forms  ending 

f, 

in  ^tould  all  be  either  nominative  singular  or  accusative  singular,  and 
forms  ending  in  ”a"  would  all  be  either  ac'jfasative  singular  or  genitive 
singiD-ar.  The  identification  for  CTyflenr  and  ctoji  would  be: 

*  # 

’’cTyfleHT"  would  represent:  cryAem'  and  cryHeHT^ . 

*  -K-  . 

"ciyfleHTa”  would  represent:  oryfleHra^  and  cryfleHra^^ . 

*  * 

"ctoji"  would  represent:  ctoji  and  ctoji^j. 

* 

"cTOJia"  would  represent:  crojia.  and  sTOJia,  . 

— — As  “  Gs 


5^  The  symbol  is  used  to  represent  the  null  affix. 
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OTOJI'*^ 

- Ha 

* 

OTOJEbl^ 

Hp 

"OTOJl" 

'’orojaa” 

OToa* 

Aa 

OTOJttH,_ 

"oTOJia" 

’’otojiob” 

cToaa^ 

Gs 

# 

OTOJlOBn„ 

Gp 

"cTOJiy" 

"oTOflan" 

OTojay* 

Da 

CTOjiai^ 

"CTDJIOU" 

"oroaaMH® 

OTOJIOU* 

Is 

oTOJie* 

Pa 

OTOjiaia 

0T0JI8X 

“otojig” 

"oTOjiax" 

* 

Paradiem  of  OToa 

Reduced  Paradigm  of 

OTOX 

Fig. 

3-5 

Fig.  3-6 

Given  Matajka's  subdivieion,  it  ia  poflsible  to  reduce  the  oultiplfl  uoageo. 
Aniaate  nouna  ending  in  are  noainative  aingular  and  inaniaate  nouna 
ending  in  "a”  are  genitive  aingular.  With  the  diviaion,  the  identification 
for  cryASHT  and  croji  are: 

* 

"oxyAeOTf*  repreaentc:  cyyseHg^^ , 

»  ,  * 

«0TyAeHra"  representa:  cryflegra^  and  SSHSSfbo' 

*  « 

^cToa"  repreaents:  end 

"cTojia"  represents:  oroji^^. 

for  each  of  the  three  morphological  types,  Matejka  further  conatructed 
a  set  of  tables  which  list  the  lexical  attributes  for  every  desinence  in 
every  class^  (fig*  3-7). 

The  result  of  these  efforts  was  a  complete  definition  of  the  para¬ 
digms  of  aussian  inflected  words,  including  a  table  of  lexical  attributes 
for  the  different  membara  of  the  paradigm.  Whereas  Magassy  and  Matejka 


i'  Errors  la  the  set  of  tables  are  listed  in  Ippendix  B. 
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IJ2 

U. 

N2 

al 

K2 

12 

11 

H 

HpAp 

I^p 

PsNpAp 

12 

A 

Gs 

GsAs 

Gs 

13 

S) 

Da 

Ds 

Ds 

lit 

S 

NaAs 

Ns 

NsAs 

15 

ea 

Op 

OpAp 

Qp 

16 

en 

Is 

Is 

Is 

17 

fix 

Pp 

Pp 

Pp 

18 

fiU 

Dp 

Dp 

Dp 

19 

fim 

Ip 

Ip 

Ip 

Orammatical  Specifications  for  Noun  Paradigms  of  Class  N2: 
Inanimate,  l^e  Animate,  Type  Ij  and  Inanimate,  Type  2  (Ref.  10) 

Fig.  3-7 


The  Affixes  of  Order  On©  Generated  ty  the  Inverse 
Inflection  Algorithm 

Fig..  >8 
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stopped  at  this  point,  the  information  they  obtained  oould  ha^e  been  used 
to  generate  automatically  a  matrix  ^  (Sec.  2.3)  for  every  word  paradigm 
by  first  generating  the  complete  paradigm  Instead  of  the  reduced  paradigm 
and  then  storing  the  lexical  attributes  with  each  member  of  the  complete 
paradigm  in  the  paradigm  generating  routines.  *  The  grammatical  speci¬ 
fications,  as  illustrated  in  Fig.  3-7,  are  a  graphical  representation  of 
the  set  of  vectors  of  lexical  attributes  for  each  desinence  (Sec.  2.2). 

B.  The  Inverse  Inflection  Algorithm 

Oettinger’s  inverse  inflection  algorithm  is  the  arbitrary 
factoring  algorithm  currently  used  to  factor  affixes  for  dictionary  compi¬ 
lation  and  for  the  Continuous  Dictionary  Run.  This  algorithm  provides  a 
two-step  process  for  factoring  affixes  from  Russian  words.  As  a  first  step, 
one  of  three  affixes,  (null  affix),  ''cb",  and  "ca",  is  reco^ized.  These 
affixes  are  referred  to  as  affixes  of  order  zero  and  generally  describe  the 
reflexive  and  reciprocal  properties  of  Russian  verbal  forms.  As  a  second 
step  are  recognized  fifty-seven  affixes  '^Fig.  3-8)  that  are  referred  to  as 
affixes  of  order  one.  These  affixes  closely  coincide  with  the  desinences 
of  Russian  words,  Eveiy  Russian  word  has  an  affix  of  order  zero  and  an  affix 
of  order  one.  If  nothing  is  factored,  then  the  iffix  "#•"  is  assigned  to  tUs 
word. 

The  inverse  inflection  algorithm  operates  efficiently  on  noun  and 
adjective  paradigms,  which  usually  require  only  one  stem  entiy  in  the 
dictionary.  The  factorization  of  affixes  in  verb  paradigms  is  less  efficient 
than  the  factorization  of  affixes  in  noun  and  adjective  paradigms,  and 
generally  three  or  four  stems  are  required  to  define  a  paradigm  completely. 
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To  aeparato  ths  graaaatioal  funotions  of  the  etems,  jEore  artcasiva 
sodlug  of  the  verb  entries  than  of  the  noun  and  adjootlve  •ntries  has  proved 
naoe&sary  (see  Sec.  3).  The  inoluslop  of  six  noro  affixes  would  redua©  the 
nuiaber  of  verb  stoEiB  BigDifioantl7  (Table  3-1) »  The  suggested  affixes  are 
"mo",  "fee",  "x",  "xa",  "xo",  and  "xh".  When  only  the  most  populous  verb 
olaasos,  Vl,  V3,  and  V4,  were  considered,  a  rou^  estimate  of  the  effect  of 
the  inclufiion  of  these  affixes  indicated  that  the  dictionary  would  be  reduced 
in  slao  by  about  bf*  with  a  potentially  great  simplification  in  the  oodlng 
of  the  verb  entries.  It  appears  upon  only  superficial  examination  that  this 
addition  would  not  add  much  to  the  problem  of  false  factoring  (Sec.  3B)» 

it 

As  an  example,  in  the  paradigm  of  oo^bmb  ,  which  is  in  class  V3,  five  stems 
are  generated:  "ocHOBa",  "och",  "ooBy",  "ocHOBax",  and  "oosyfe"  (Pig.  3-9)  • 


Number 

of 

Stems 

Number 

of 

Stems 

Humber 

of 

Stems 

Humber 

of 

Stems 

Class 

Old  Haw 

Glass 

Old  New 

Glass 

Old  New 

Class 

Old  New 

VI 

4 

2 

¥5.1 

3 

2 

¥8  2 

4 

4 

¥13 

5 

4 

V2 

3 

2 

V5.2 

3 

2 

¥9 

5 

4 

¥14' 

5 

3 

V2.01 

4 

2 

¥5.3 

5 

3 

¥9.1 

4 

3 

¥15 

4 

2 

V3 

5 

3 

¥54 

3 

2 

¥10 

3 

2 

¥15.1 

4 

3 

V4 

3 

2 

¥5.41 

3 

2 

¥10.01 

3 

2 

¥15  o2 

4 

2 

¥4.01 

4 

2 

¥6 

4 

2 

¥10.1 

4 

3 

¥16 

6 

6 

V4.02 

5 

3 

¥6.1 

5 

3 

¥3J0o2 

3 

2 

¥17 

4 

3 

¥4.1 

4 

3 

¥6.2 

5 

3 

¥lDo3 

3 

2 

¥18 

5 

V4.ll 

5 

3 

¥7 

5 

4 

¥104 

4 

3 

¥19 

4 

3 

V4.2 

3 

2 

¥8 

4 

3 

¥11 

5 

4 

¥20 

4 

3 

V4.21 

4 

2 

¥8.1 

4 

4 

Vilol 

3 

2 

¥21 

7 

5 

¥5 

4 

2 

¥8.n 

5 

5 

¥12 

5 

3 

The  Heduction  in  the  Kumbai*  of  Verb  Stsu^  per  Class  if  Affixes 
"fee",  "x",  "xa®,  "xo",  aM  "xh"  were  Included 
in  the  Inverse  Inflection  Algorithm 

TABLE  3-1 
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"ooHoea-Tb" 

"ooKOBaji-a" 

"ocH-y»" 

"oonoBax-o" 

"ocHy-entb” 

"OCHOBaX-K" 

"ooHor-«T" 

"ocEy-il” 

"oeny-ou*' 

"ocnylir-e” 

"ooHy-ore" 

"ooBy-fl" 

"ocHy-wr" 

"OOHOBa-B" 

"OCHDBaJI-#" 

"OOHOBa-BnDl" 

Eeduced  Paradigm  of  opjaPstgrB 
Using  the  Inverse  Inflection  Algorithm 

Fig.  3-9 


"oCHOBa-Tb" 

"oojsDBa-Jia" 

"ocH-yio" 

"OOHOBa-CIO” 

"o  OBJ' -©mb" 

"OCHOSa-XK" 

"ocRy-OT" 

"oouy-ii" 

"ooHy-ou" 

«ooBy-iiT(y' 

"ocKy-oTo" 

"ooMy-fl" 

"ocHy-ror" 

"OCKODa-B" 

”oCH03a-Ol" 

"ooMona-Biroi'’ 

Eeduced  Paradigm  of  ooBOBaTb 
Using  the  Suggested  Modified 
Inverse  Inflection  Algorithm 

Fig.  3-10 


If  the  suggested  affixes  were  factored,  only  three  stems  would  remain: 
"ocHOBa",  "och",  and  ”ocny"  (Fig.  3-10). 


3.  Mapping  of  Desinences  Onto  Affixes 

It  is  convenient  to  determine  the  lexical  attributes  associated  with 
each  of  the  set  of  affixes  for  each  class  of  words  before  the  programe 
(Sec.  5)  which  analyze  the  words  are  considered.  As  in  the  case  of  th@ 
idealized  dictionary  (See.  2.2),  Matejka's  tables  of  lexical  attributes  are 
given  in  terms  of  desinences  rather  than  of  affixes,  and  It  is  necessary 
to  map  the  set  of  desinences  onto  the  set  of  affixes  in  order  to  determin® 
the  relationship  between  the  affixes  and  the  lexical  attributes. 

The  procedure  that  is  followed  approximates  the  procedure  of 
obtaining  V  from  in  the  idealized  diction^'y,- in  particular,  steps  10 
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to  21  In  Program  2-1.  The  technique  varies  from  that  used  in  Program  2-1, 
since  the  information  available  as  input  is  Sonewhat  different  from  the 
idealized  case. 

Several  approaches  other  than  the  one  to  be  followed  could  bo  used 
to  obtain  the  ^laffix- lexical  attributed  relationships .  One  that  was  mentioned 
in  Sec.  2  consists  of  modifying  the  paradigm  inflection  routines,  so  that 
complete  paradigms  rather  than  reduced  paradigms  are  generated.  The  lexical 
attributes  are  associated  immediately  with  the  generating  affixes  of  the 
members  of  the  paradigm  rather  than  with  the  desinences.  Although  con¬ 
ceptually  simple,  this  approach  would  involve  extensive  recoding  of  present 
programs. 

A  second  possible  fpproach  is  suggested  by  the  idealized  dictionary. 

It  was  pointed  out  in  the  summary  of  Chap.  2  that  a  major  defect  of  the 
idealized  dictionary  was  the  amount  of  storage  space  required.  The  main 
difficulty  is  the  fact  that  each  desinence  maps  onto  so  many  affixes  that 
the  entry  function  vectors,  which  are  stored  in  each  dictionary  entry,  require 

if 

hundreds  of  bits.  Since,  for  a  given  class  of  words,  most  of  these  bits 
are  never  used,  a  practical  solution  would  be  to  increase  the  number  of 
reference  matrices,  so  that  there  is  a  reference  matrix  for  each  of  Magassy's 
morphological  classes.  The  number  of  columns  in  each  matrix  would  be 
drastically  reduced,  since  only  a  small  number  of  the  affixes,  approximately 
twenty,  would  be  used  within  any  one  class.  Thus,  the  entry  function  vectors 
would  be  correspondingly  reduced,  and  the  sia^)!©  identifying  procedure  of 
Program  2-3  could  be  used  with  only  alight  m(r:iificatioa.^  This  solution 

This  approach  was  suggested  by  D.  W.  Davies  of  the  National  Physical 
laboratory,  Teddington,  England,  during  his  visit  to  Cambridge,  Mass, 
in  December,  1959. 


would  nor  bo  practical  on  the  Unlvao  I,  the  computer  currentdy  being  uoed 
at  Harrard  University,  since  this  oomputor  is  not  a  binary  maohina  and  the 
individual  bits  are  not  accessible  to  the  programer.  It  would  be  ludicrous 
to  simulate  a  binary  machine  by  using  a  character  position  to  roporesent  a 
single  bit. 

In  the  scheme  to  be  described  in  this  section,  it  is  a  moot  question 
whether  the  desinences  or  the  generating  affixes  are  being  mapped  onto  the 
affixes.  The  actual  process  involves  both,  since  on  the  one  hand  the 
generating  affixes  will  be  manipulated  to  determine  the  affixes,  but  on 
the  other  hand  the  lexical  attributes  which  are  associated  with  the 
desinences  will  be  assigned  to  the  generating  affixes. 

This  section  has  been  divided  into  two  parts,  the  first  dealing 
with  the  rapping  technique  (Sec.  3A)  and  the  second  dealing  with  the  problems 
that  evolved  fraa  the  adopted  procedure  as  well  as  with  their  solution 
(Sec .  3B)  . 


A.  Correlation  of  Generating  Affixes  and  Affixes 

The  generating  affixes  are  mapped  onto  the  affixes  for  each  of 
Hagassy's  morphological  classes.  Later  the  lexical  attributes  associated 
with  the  desinences  will  be  associated  with  the  generating  affixes.  Shis 
information,  together  with  the  results  of  the  mapping  operation,  will 
determine  the  program  for  a  logical  tree  for  each  morphological  type-  On© 
of  these  trees  will  be  scanned  every  time  a  dictionary  entry  is.  analysed 
(see  Sec.  5).  Although  the  programming  for  a  tree  is  more  complex  than  thatr 
for  Prograa  2“3,  the  time  needed  for  analysis  will  be  of  the  same  order  of 
magnitude,  since  only  a  minute  section  of  the  tree  will  be  scanned  during 
the  analysis  of  any  given  woM. 
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ITie  technique  for  mapping  Magaasy's  generating  affixes  onto  tho 
affdjcas  defined  by  the  inverse  inflection  algorithm  is  shown  in  Program  3-1 
for  a  generating  affix  g  in  a  class  o  of  one  of  ths  three  morphological 
typos.  This  technique  can  be  used  with  any  system  of  morphological  olaoseo 
and  any  factoring  algorithm.  A  vector  a  is  used  (not  neanasarily  tho  a  of 
Chap.  2),  each  of  whose  components  is  an  affix  factored  by  the  inverse 
inflection  algorithm,  and  which  incLides  every  affix  once  and  only  once, 
the  order  of  the  components  being  immaterial. 


Symbol 

Function 

g 

Generating  affix  being  mapped 

li 

Column  vectors 

(fee’" 

A  possible  ending  of  the  generating  stem  in  word  class  0 

F(x) 

Inverse  inflection  algorithm  on  word  x 

9 

Affixes  onto  which  g  can  be  mapped 

Definition  of  Symbols 
TABIE  3-2 


1  - > 

< —  g 

2 

r'\ 

3 

4 

6  < — {(2 JS)  7^  0}/ a 

Program  for  the  Mapping  of  Generating  Affixes  onto  Affixes 

Program  3-1 
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Every  component  of  a  column  vector  y  is  defined  in  step  1  as  the 
generating  affix  g  that  is  being  correlated.  This  vector  is  adjoined  to 
the  column  vector  b^  in  step  2,  whore  each  component  of  b^  represents  one 
of  the  possible  endings  of  the  generating  stem  from  Magassy's  tables  in 
class  c.  (Dashes  have  been  used  in  the  representation  of  b  ,  since  it  might 
be  necessary  to  know  more  than  the  last  letter  in  each  component.)  The 
effect  of  this  operation  is  to  attach  the  generating  affix  to  each  possible 
generating  stem. 

For  class  N2  (Figs .  3-4  and  3-7)  and  generating  affix  "h" , 


In  step  3,  every  component  of  ^  is  considered  a  vector  and  is  factored 
by  the  inverse  inflection  algorithm  F,  and  the  resulting  logical  vector  is 
used  to  compress  the  component  itself.  Every  compressed  component  is  con¬ 
sidered  as  a  component  of  the  row  vector  x.  The  affix  vector  a  is  mapped 
onto  X  in  step  4,  and  the  resultant  vector  is  used  to  reduce  a,  giving  a 
veclor  0,  each  of  whoso  components  is  one  of  the  affixes  that  correlates 
with  the  generating  affix  g.  In  the  same  example, 


The  results  of  the  mapping  operation  are  shown  in  Appendix  C. 
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B.  False  Factoring 

The  factoring  algorithm,  under  certain  conditions,  can  factor  part 
of  a  stem  with  the  desinence  to  obtain  the  factored  affix.  The  residue  of 
the  word,  the  factored  stem,  will  be  shorter  (will  contain  fewer  characters 
than  the  factored  stem  of  another  member  of  the  same  paradigm  where  this 
phencmienon  dees  not  take  place.  In  such  oases  the  canonical  stem  is  not 
unique.  In  the  example  of  Sec.  2s3,  "axoMe”  will  be  factored  into  the 
stem  "stom"  and  the  affix  "e”  while  "arouy"  will  be  factored  into  the  stem 
"ax"  and  the  affix  "owy" .  Likewise,  if  the  factoring  algorithm  cannot 
factor  the  entire  desinence,  a  factored  stem  can  bo  longer  than  the  normal 
factored  stem  of  a  paradigm.  For  example,  while  "ooayii"  is  factored  into 
"oewy"  and  ''ft",  "ooHyftTe",is  factored  into  "ocHyih'"  and  "e"  (Fig.  3-9). 

Both  extra  long  and  extra  short  stems,  which  will  be  referred  to  as 
anomalous  stems,  exist  in  the  Harvard  Automatic  Dictionary. 

Anomalous  stems  are  a  natural  consequence  of  factoring  even  in  the 
idealised  dictionary,  since  independent  of  coded  syntactical  ^ sf (nnnation, 
it  is  impossible  to  write  a  factoring  algorithm  that  tdll  recognize  wheldier 
or  not  a  string  of  letters  represents  some  desinence.  In  the  case  of  the 
idealized  stem  dictionary,  an  extra  dictionary  entry  is  generated  with  its 
own  entry  function  vector,  whenever  an  anomaljsus  stem  occurs.  Similarly, 
in  the  Harvard  Automatic  Dictionary  each  anomalous  stem  generates  its  om 
dictionary  entry.  The  difficulty  lies  la  the  fact  that,  prior  to  this  work, 
there  was  no  information  in  the  experimental  dictionary  equivalent  to  the 
entry  function  vector  to  indicate  that  a  stem  is  anomalous.  This  lack  of 
information  was  the  cause  for  an  excessive  number  of  stem  homographs.  Sins© 
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many  of  these  homographs  from  the  experimental  dictionary  do  not  appear  as 
hoBographe  in  the  idealized  dictionary,  they  are  noneesential  homographs 
in  the  experimental  dictionary. 

An  example  of  the  nonessential  stem  homography  in  the  experimental 
dictionary  is  shown  by  the  reduced  paradlps  of  two  Russian  nouns,  bw  and 
BajgoTa  (Figs.  3-11  and  3-12).  The  string  "Baror'*  from  the  paradigm  of 
Bajnora  is  factored  into  the  stem  ”Ban”  and  the  affix  by  the  Inverse 
inflection  algorithm.  This  stem  is  identical  with  the  stem  of  sax  . 
Therefore,  any  time  that  either  any  member  of  the  reduced  paradigm  of  saa 
or  the  paradigmatic  form  "BamoT”  appears  in  a  text,  both  dictionary  entries 
with  the  stem  "bsji"  are  selected.  3n  the  idealized  dictionary  different 
affixes  would  be  represented  in  the  two  entry  fun'ction  vectors,  so  that  one 
of  the  dictionary  entries  would  always  be  incompatible. 

Another  problem  occurs  in  the  paradigm  of  stom  (Fig.  3-13) •  Two 
distinct  stems  are  factored  in  this  paradigm,  and  the  affix  "om"  can  be 
associated  with  both  of  them.  The  affix  “oji"  is  factored  both  from  the 
string  "axoM"  and  from  the  string  '’btomom"  .  It  is  therefore  necessary  to 


"aajiHu'' 

"saji-a" 

"BaJI-OB” 

''B£OI-y'’ 

"saji-aiiH 

"saa-©” 

“Baji^ax” 

"BajiHn'-a" 

"BajmM'-efi'’ 

''BaHMT-il” 

"Baa-BT” 

'’sasOT-©” 

”BajiJCT~aM” 

"BajuOT-y'’ 

'’BajnOT-ofi" 

'’BajECT-ax” 

Reduced  Paradigm  of  saji 
Fig.  3“-ll 


Reduced  Paradigm  of  Bamara 
Fig.  3-12 


3-19 


”aT-OM” 

"aTOM-a" 

"aTOu-a" 

"aTOU-OB” 

"ar-ony" 

”arou-aM” 

”aTOM-OU" 

"avpu-asa" 

"aTOM-e" 

"aTou-'ax” 

* 

Reduoed  Paradigm  of  aroM 
Fig.  3-13 

ba  able  to  determine  when  the  affix  ”om"  should  be  the  resultant  affix  of 
the  desinence  "om"  and  when  it  should  be  the  resultant  affix  of  the  desinenoe 
This  is  an  example  of  the  artificial  affix  homograph.  This  type  of 
homograph  is  also  nonessential,  since  in  the  idealized  dictionary  the 
appropriate  lexical  attribute  would  be  listed  with  the  affix  in  both  oases. 

As  is  shown  in  ^pendix  C,  every  affix. appearing  in  the  fourth  column 
is  the  result  of  a  factoring  that  produced  an  anomalous  stem}  for  instance, 
in  the  paradigm  of  sajucffa*  (class  M4)  only  the  form  "sanwr"  is  factored 
into  an  anomalous  stem. 

The  following  is  a  disoussion  of  the  various  operations  that  have 
been  adopted  to  patch  the  experimental  dictionary  so  that  its  output  should 
bo  Identical  with  the  output  of  the  Idealised  dictionary. 

When  a  single  affix  is  associated  with  an  anomalous  stem,  the  entry 
function  vector  yj^  for  the  anomalous  stem  contains  only  on®  It  is  a 

simple  matter  to  put  a  mark  somewhere  in  the  existing  dictionary  entry  to 
indicate  that  the  item  should  be  treated  ^  a  fuUy  inflected  item.  Then 
the  affix  of  the  text  word  whose  stem  matches  the  stem  of  the  dictionary 
entry  should  be  compared  with  the  single  affix  stored  in  the  diotlonary. 
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Giulisno^  already  adopted  auoh  a  technique  with  respeot  to  stems  with  zero 
or  one  letter  to  reduce  horaography.  If  the  experimental  dictionary  affix 
is  not  identical  with  the  affix  of  the  text  word  when  the  special  mark  is 
present,  than  the  dictionary  entry  is  incompatible,  that  is,  it  is  not  the 
one  being  sought. 

It  should  bo  pointed  out  that  during  dictionary  compilation  in  the 
experimental  system,  when  the  inflected  forms  are  generated  from  the 
canonical  forms,  they  are  generated  in  the  order  given  in  the  reproduction 
of  Magassy's  table  in  Ref.  7,  as  illustrated  in  Fig.  3-4*  When  these  para¬ 
digms  are  condensed  by  a  later  routine,  the  affix  from  the  first  form 
encountered  with  a  given  stem  is  stored  in  the  dictionary.  It  is  indeed 
fortunate  that  the  affix  normally  stored  with  the  generating  stem  never 
causes  confusion  with  the  affixes  that  form  anomalous  stems .  In  particular, 
it  is  fortunate  that  the  form  "aroMOM”  is  not  the  first  form  with  the  stem 
"aTOu”  that  is  generated,  since  if  it  were,  the  affix  "om”  would  be  stored 
in  the  dictionary  entry  of  "aroM”,  while  "om"  is  already  stored  with  the 
dictionary  entry  of  "ax”,  originating  from  the  form  "btom”  .  If  "btomom”  were 
the  first  generated  form  with  the  stem  "axoM",  and  "axoM”  were  the  first 
generated  form  with  the  stem  "a-r",  then  there  would  bo  no  at^tomatic  way  of 
distinguishing  that  the  latter  stem  is  the  anomalous  one. 

There  remains  a  small  group  of  noun  paradigms,  such  as  that  of  axoM  , 
which  requires  special  treatment  because  there  is  more  than  one  affix  associ¬ 
ated  with  an  anomalous  stem;  for  cample,  both  and  "oity”  are  factored, 
leaving  the  stem  "bt”.  Since  there  is  no  coding  present  in  the  experimental 
dictionary  entry  to  distinguish  the  different  inflected  forms,  and  since 
fortunately  there  appear  to  be  never  more  than  two  affixes  , associated  with 
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an  anomalous  stemj,  an  extra  dictionary  entry  can  be  generated^  and  each  of 
the  two  anomalous  stem  inflected  forms  can  be  treated  as  a  fully  inflected 
item,  thereby  increasing  the  size  of  the  dictionary  by  only  0.5^  Tliis 
increase  would  not  occur  in  the  idealized  dictionary,  because  the  entry  of 
the  stem  "ar"  would  contain  all  the  neceecary  information  about  both  affixes. 

Among  the  verb  paradigms  in  the  experimental  dictionary  there  are 
many  more  anomalous  stems,  owing  to  a  large  number  of  verb  desinences  that 

are  not  factored  by  the  inverse  inflection  algorithm.  If  the  paradigm  of 

* 

npflxoffMTb  is  used  as  an  example,  four  unique  stems  are  generated:  "noflxofl", 
"noflxo;nii",  "iioflxo;pui",  and  "noflxox”.  These  stems  have  seven,  three,  four, 
and  one  affixes  associated  with  them,  respectively  (Fig.  3-14).  Only  in  the 
stem  "noflxox"  did  it  seem  practical  to  mark  the  stem  as  the  noun  and  adjec¬ 
tive  anomalous  stems  were  marked,  since  so  many  affixes  are  associated  with 
the  other  stems.  If  the  same  system  were  adopted  with  the  other  stems,  the 


"OO^XOflM-Tb" 

(BO) 

(B3) 

"noflxox-y" 

(Bl) 

"noflxoflMji-a" 

(B3) 

"nOflXOfl-Millb” 

(Bl) 

"nDflXOflMJI-o" 

(B3) 

"rioflxofl-^ar” 

(Bl) 

"noflXo;i^itJi-M" 

(B3) 

"ho^xo^^-mm” 

(Bl) 

"noflXDfl-M” 

(B4) 

’'iiD;![Xc;5-MTe" 

(Bl) 

"lIOflXO^-H” 

(B5) 

"iroS3:oA“^" 

(Bl) 

"nOflXOflM-B” 

(B6) 

"^oflxo;^J^-BmK'' 

(B6) 

Heduced  Paradigm  of  no^xoflMTb 
with  Associated  Tense  andliio(^  Indie i 

Fig.  3-14 
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size  of  the  dictionary  would  increase  by  an  intolerable  amount,  thus  defeating 
the  main  advantage  of  a  stem  dictionary  over  a  full  paradigm  dictionary. 

The  problem  haa  not  become  acute  since,  when  the  dictionary  was  being 
compiled,  it  was  recognized  thdt  the  multiplicity  of  stems  occurring  in  every 
verb  paradigm  would  cause  stem  homographs.  A  coding  scheme,  the  tense  and 
mood  indicators,  was  incorporated  into  the  dictionary  entries  to  identify 
the  grammatical  functions  that  the  stem  and  any  of  its  affixes  could  assume 
(Table  3-3) .  The  correct  coding  associated  with  the  inflected  forms  in 
Fig.  3-14  has  been  placed  in  parentheses  next  to  each  inflected  form.  The 
coding,  as  it  wouH  appear  in  the  third  semiorganized  word  for  each  stem, 
is  shown  in  Fig.  3-15. 

BO  -  infinitive 
B1  “  present  indicative 
B2  “  future  indicative 
B3  ”  past  indicative 
B4  ”  imperative 
B5  “  past  gerund 
B6  -  present  gerund 

Tense  and  Mood  Indicators  in  the  Third 
Semiorganized  Word  of  Verb  Entries 

TABIE  3-3 


”no^pcog" 

B]B4.B5 

”no^ixo;Uf" 

BOBS 

B3 

•’no;^xoa:" 

B1 

Tense  and  Mood  Coding  in  the  Third  Semiorganized 
Word  for  the  Stems  of  nDfl;xo^OT&^ 

Pig.  3-15 
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As  an  example  of  how  the  tenae  and  mood  coding  helps  in  analysis, 
consider  the  reduced  paradigm  of  the  nounnoflxo;^  (Fig.  3-l6).  Stem 


"no;^xo^-a" 

"^Dflxo;^-y" 

"iio;^xo;H-om' 


"nDflXOfl-H*' 

"nOflXOfl-OB" 

''noflXo;H-au*’ 

"no;HXOfl-a3m” 

"iK);ii(Xo;H-ax" 


Seduced  Paradigm  of  ^oflxo;^ 
Fig.  3-16 


homography  exists  with  the  stem  "noflxofl”,  which  is  common  to  both  the  noun 

and  verb  paradigms.  Thera  is  no  essential  homography  in  the  experimental 

dictionary,  since  all  the  affixes  associated  with  the  two  stems  "noflxofl”  in 

the  two  paradigms  are  different.  For  example,  if  the  string  "noAXOfla"  occurs 

as  a  text  word,  both  stems  "no^xofl"  will  be  selected  during  dictionary 

look-up.  The  affix  "a"  in  class  V4  represents  the  single  grammatical  function 

past  indicative,  but  the  past  indicative  cannot  be  associated  with  the  verb 

stem  "roflxofl",  as  shown  by  the  fact  that  there  is  no  "B3"  coded  in  its  third 

semiorganized  word.  Therefore  "no^^xo^i'’  cannot  be  an  inflected  form  of  the 

verb  paradigm.  However,  since  the  string  contains  an  affix  that  can  belong 

■a 

to  the  paradigm  of  the  noun  ,  the  string  can  be  correctly  identified 

as  a  noun. 

Ih  the  idealized  dictionary  the  special  coding  would  not  be  necessary, 
since  the  lexical  attributes  would  be  represented  in  the  reference  matrix 
and  the  entry  function  vector. 
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4-  The  Anonalous  Stem  Program 

Anomalous  stems  are  detected  and  marked  automatically  in  the  experi¬ 
mental  dictionary,  so  that  the  analyzing  programs  (see  Sec.  5)  can  recognize 
that  only  the  single  affix  that  is  stored  in  the  dictionary  entry  is  assool- 
ated  with  the  stem.  This  serves  two  distinct  purposes;  (1)  The  affix  stored 
in  a  marked  dictionary  entry  is  compared  with  the  £\ffix  of  the  text  word. 

If  they  do  not  match,  the  dictionary  item  is  incompatible.  (2)  The  mark 
indicates  that  the  affix  stored  in  the  dictionary  was  caused  by  false 
factoring. 

The  anomalous  stem  program  has  three  outputs;  (l)  a  tape  containing 
all  the  input  items  with  an  appropriate  mark  in  the  anomalous  stem  items, 

(2)  a  list  of  potential  dictionary  entries  generated  by  the  program  • 

(Sec.  4A),  and  (3)  a  list  of  potential  dictionary  entries  which  must  be 
studied  further  (Sec.  4B). 

Every  dictionary  entry  is  stored  as  a  30-word  item,^  a  size  chosen 
both  to  be  compatible  with  the  block  transfer  operations  (60  words  at  a  time) 
of  the  Univac  I  and  to  have  sufficient  space  available  to  store  the  necessary 
syntactic  and  lexical  information  and  various  forms  of  experimental  markings . 
Since  the  morphological  and  syntactic  information  is  contained  in  fewer  than 
ten  of  these  machine  words,  it  has  been  feasible  to  compress  the  information 
of  immediate  interest  into  10-word  items,  which  will  be  referred  to  as 
texthadlc  items. An  analyzed  30- word  item  and  the  condensed  texthadic 
item  are  illustrated  in  Fig.  3-17 .  The  columnar  layout  of  the  texthadic, 
with  reference  to  the  30-word  item,  is  listed  in  Table  3-4.  The  anomalous 
stem  program  will  be  described  in  terms  of  the  30-word  item. 
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Coluian  i  “  First  English  equivalent  from  dictionary.  (Word  5) 

Goluan  2  -  Class  marker.  (Fi'om  Word  3) 

Column  3  “  Russian  word  transliterated  with  affix  attached.  (Words  0-2) 
Column  4  ”  Text  sorial  number.  (From  Word  4) 

Column  5  “  Organiaod  word.  (Word  26) 

Column  6  -  First  word  of  interpreted  information.  (Word  24) 

Column  7  "  Second  word  of  interpreted  information.  (Word  27) 

Column  6  -  Thi.rd  semiorganized  word.  (Word  29) 

Column  9  ”  Dictionary  aerial  number.  (Word  25) 

Columnar  Layout  of  Texthadio  with  References  to  30-word  Item 

TABIfi  3-4 

Only  the  first  English  correspondent  from  the  30 -word  dictionary 
item  is  transferred  to  the  10-word  texthadio  item.  This  correspondent  has 
little  significance  in  the  translations  of  the  examples  that  will  be  given 
throughout  this  thesis.  The  purpose  of  including  the  single  correspondent 
in  the  texthadio  items  is  to  help  the  re^er  who  has  no  knowledge  of  Russian 
to  identify  individual  Russian  words. 

The  prcgram  has  been  designed  with  two  purposes  in  mind,  first,  to 
update  the  entire  experimental  dictionary  when  a  change  is  made  in  the  word 
analyser  program  and,  second,  to  process  new  items  before  they  are  merged 
into  this  existing  experimental  dictionary.  It  should  b©  stressed  once  more 
that  this  program  v:ould  not  be  necessary  in  the  idealized  dictionarj^  since 
the  necessary  markings  exist  in  the  reference  matrix  and  the  entry  function 


vector 


\ 
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^  A.  The  Identification  of  Anomalous  Stems 

Russian  mords  are  considered  to  be  divided  into  six  morphological 
types  (Table  3-5),  mlth  a  distinct  format  for  the  representation  of  the 
grammatical  properties  of  each  type.  These  six  types  are  represented  by 
fppropriate  alphabetic  symbols  In  charaoter  position  1  of  word  3  of  the 
30-word  item  as  shown  in  Table  3-5» 

(1)  Noun  (N) 

(2)  Adjective  (A) 

(3)  Verb  (V) 

(4)  Pronoun  (P) 

(5)  Numeral  (D)  . 

(6)  Indeclinable  (l) 

Morphological  Types  in  the  Harvard  Automatic  Dictionary 

TABIE  3-5 

The  program  oxamlnes  the  class  marker  and  affix  of  every  30-word 
item.  The  logic  of  the  program  is  expressed  in  a  tree.  The  program  branches 
initially  on  the  throe  productive  morphological  types,  the  noun,  adjective, 
and -verb.  The  secondary  branch  for  each  type  is  on  the  various  classes  among 
which  anomalous  stems  can  occur.  The  third  and  last  branch  is  on  the  affixes 
that  are  factored  with  anomalous  stems.  One  of  these  affixes  is  stored  in 
the  dictionary  during  compilation.  The  classes  among  which  anomalous  stems 
occur  can  be  identified  easily,  since  they  are  the  classes  with  affixes  in 
the  fourth  column  of  the  table  of  Appendix  G.  A  complete  list  of  these 
I  affixes  from  Appendix  C  is  shown  in  Table  3-6. 


f 
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Class 

T - - - 

Affixes 

t 

Class 

Affixes 

A3 

aw  at  B  SB  eu  ,j 

N8 

aw  aT  ax  B  eB  _ 

QT  MM  MT  OB  OM 

ero  ewl  ewyl  eT^  exeS 

yr  UM  lOT  HM  AT 

MM  Mt3  MTe3  MX  OB 

A4 

BniM 

oro  OM^  owy*  yT  hu 

A5 

e  a 

UX  KT  MM  MT  MX 

A8 

ax  ]|3mM  MX  ux  >oc 

N8.1 

OM 

N  (any) 

Ob  CM 

N8.15 

SM  PQi 

N1 

aM  ar  B  SB  eu^ 

NIO 

piel  Piiil 

ei^lerS  eie^  km  mt^ 

Nil 

eft  bw 

MTe3  OB  oij<^  ouy^  yr 

Nll.l 

MM  bH) 

HM  KIT  MM  MT 

Nil. 2 

bH) 

Nl.l 

ax  MX  UX  MX 

V  (any) 

e 

N1.2 

SB  SM  eT  OB  OM 

VI 

as  MM 

N2 

aM  ee^  eft^  Ke^  Mft2 

V3 

yio 

oe^  ofi3 

V4 

awM  aM  MMM  y  yro 

N2.1 

eft  bio 

UMM  MMM 

N3 

exe  MT9  Tb 

V4.01 

7 

N3.05 

Tb 

V4.02 

HM  eft  oft  yw 

H 

aM  aT  B  ea  eM-*- 

V4.1 

7 

eijiyi  eT^  ere^  mm  mt® 

V4.ll 

7 

MTe3  OB  oii4  oMy^  .VT 

V5.2 

aMM  MMM  HMM  MMM 

HM  KT  MM  MT 

V5.3 

aM  eft  Pift  oft  yK) 

N4. 05 

SM 

Uft  MM 

N4.06 

OM 

V6.1 

oft 

N4.1 

BmH  ax  MX  HX  MX 

V6.2 

aMM  mat  bimm  mmw 

N5 

aMM  eT8  PIMM  Mre  tb 

VlD.l 

PIMM 

HMM  MMM 

710.4 

PIMM 

V12 

aM 

V13 

eft  bH) 

N5.05 

aM  eel  efil  oe^  oft2 

VU 

oft 

yK>  meS  Ha3  fuj 

V15 

yro 

N5.2  ' 

eftl  biqI 

V18 

aM 

'N5.3 

Mft  bH) 

N6 

Tb 

N6.1 

emb  Miiib 

N7 

Mft 

Affixes  Marked  by  Anomalous  Stem  Program 
(Superscripts  denote  automatically  generated  pairs.) 

tabu:  3-6 


I 
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A  30-word  item  containing  an  anoaaloue  stem  is  marked  with  a  "1"  in 
oharao^er  position  12  of  the  organised  word,  word  26  of  the  30-word  item. 

If  the  program  finds  that  the  30-word  item  has  two  affixes  associated  with 
the  stem,  as  in  "aT-ou"j  it  automatically  generates  the  second  member  of  the 
pair,  in  tM.s  case  "ar-owy",  on  a  separate  tape.  A  "1"  is  inserted  into 
character  position  12  of  the  organized  word,  an  "H**  is  inserted  into  charac¬ 
ter  position  4  of  word  4  to  identify  the  source  of  such  an  entry,  and  if 
there  is  an  "F"  in  character  position  7  of  word  3,  indicating  that  this  is 
a  canonical  form,  that  is,  the  form  fro®  which  the  word  was  generated,  the 
is  deleted.  This  output  then  can  be  inserted  with  other  new  entries 
into  the  dictionary  in  a  single  pass. 

To  facilitate  changes  in  the  program,  any  previous  "1"  in  character 
position  12  of  word  26  that  was  inserted  by  previous  versions  of  this  routine 
is  erased.  This  makes  it  possible  to  update  the  dictionary  qulcUy  if  the 
program  has  to  be  altered. 

When  the  Harvard  Automatic  Dictionary  was  first  modified  by  this 
program  in  November  1959,  only  89  blocks  duo  to  anomalous  stems  with  two 
affixes  had  to  be  added  to  the  15,000  blocks  which  existed  at  the  timo. 

B .  Sseeptions 

Among  the  verb  paradigms  that  have  been  assigned  to  classes  V4, 

V4.01,  76*2,  and  V8,  there  exist  several  where  the  desinences  "b”  or 

are  factored  together  with  part  of  a  stem  ending  in  "o",  as  with  ”cb©=^i>”, 

* 

an  Imperative  form  of  the  verb  CBecrarb  «  The  Inverse  inflection  algorithm 
factors  an  affix  of  order  zero  and  then  possiWy  an  affix  of  order  one.  Si 
the  above  example  the  stem  is"cB",  the  affix  of  order  one  is  "e",  and  the 
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affix:  of  order  zero  io  "ob".  These  verbal  forms  must  be  identified,  so  that 
they  T»ill  not  bo  analyzed  as  ordinary  reflexive  forms . 

BooauDo  of  the  largo  number  of  reflexive  verbs,  it  is  too  rauoh  of 
a  burden  for  the  anomalous  atom  program  to  identify  automatioally  these  rare 
noni*eflexlve  inflected  forma.  A  speoial  policing  subroutine  of  the  anomalous* 
stem  program  prints  out  on  the  third  output  tf^je  of  the  program  all  the 
stems  in  these  classes  that  end  in  "o” .  These  stems  then  con  be  inspected 
visually,  and  a  "2”  can  be  inserted  into  character  position  12  of  the 
organized  word  of  those  30-word  items  which  oontai.n  such  a  speoial  anomalous 
stem.  For  .example,  if  the  paradigm  of  oBeourb  (Fig.  3-18)  is  considered, 
only  tho  stem  “oBeo”  can  be  identified  autosaaticolly.  Once  the  stem  '’oBec" 
is  found,  the  entire  paradigm  can  be  studied,  and  the  appropriate  anomalous 
stem  "cb"  marked.  * 

One  other  potential  problem  la  treated  by  this  policing  subroutine. 
The  generating  stems  of  the  verbs  in  class  V7  were  not  defined  in  sufficient 


"0B90M-Tb” 

''oBeoJW-a” 

"cBem-y'’ 

"oBeoisji-o" 

"oBeo-Mmb" 

'’CBeCMJI-H*' 

"CBeO-MT” 

”03-® -Cb” 

"obqc-«m" 

”cB-e~ofl" 

"CBQC-MT©” 

”cb®om-b” 

"gbgc-ot” 

"CBeCHJI-#" 

"CBeCbT-s” 

Reduced  Paradigm  of  csecaarb 
Pig.  3-18 
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detail  by  Magasoy  to  deteraine  the  affixea  that  might  bo  faotored  from  the 
form  oontaining  the  null  generating  affix  (Fig.  3-19)  •  The  oanoniool  fora 


CUSS 

mt!Fl£S 

cuss  IDENTIFICATION 

V7 

uopsEtyrn 

BOSHMKI^b 

A  olasa  embracing  the  verba: 

1.  ending  in  myrb; 

2.  losing  ny  in  the  masoulino 
past  tense  forms. 

GENERATION  RDIES 

Generating  Stem  (GS) 

Generated  Forms  with  Speoified 
Generating  Affixes 

word  -  last  four  letters. 

a.  word  j  .  GS  +  xo 

b.  GS  +  ay  k.  +  xh 

0.  +  Hemb  i.  +  HH 

d»  +  HOT  m.  +  HOT© 

e.  +  neu  n.  +  hb 

fo  +  H®TO  0»  +  HbTe 

g.  +  nyr  p.  +  ays 

h.  +  #  q.  +  HyBnm 

i.  +  jia  r.  +  nw 

Definition  and  Description  of  Class  V7 
Fig.  3-19 


of  every  new  verb  in  class  V7  is  also  printed  out,  making  it  possible  to 
identify  a  verb  where  an  affix  other  than  the  null  affix  would  be  factored 
from  a  stem  with  a  null  desineaoe.  Such  a  form,  which  is  an  anomalous  stem, 
is  also  marked  with  a  ”2*  in  character  position  12  of  the  organized  word. 

When  the  dictionary  of  approximately  30,000  entries  was  initially 
scanned  with  this  policing  program,  only  seven  entries  had  to  be  marked  with 
a  ”2”  in  character  position  12  of  word  26.  ^ne  of  the  seven  entries  was 


in  class  ¥7. 
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5*  The  Word  Analyzer  Programo. 

Three  maohine  programB  haTe  been  written  to  interpret  the  affixes 
of  nouns,  adjeotives,  and  verba  (the  noun  analyser,  adjective  analyzer,  and 
verb  analyzer,  respeotively) .  The  nonproduotivo  claasei  of  words,  those  in 
whioh  there  is  a  limited  and  known  number  of  words,  suoh  as  pronouns  and 
numerals,  are  prooesaed  by  the  adjootivo  analyzer.  Three  separate  programa 
were  written  only  because  of  the  restrictive  size  of  the  internal  memory  of 
the  Unlvac  I.  Conceptually,  the  three  programs  are  a  single  comprehensive 
program . 

Since  look-up  requires  a  distinct"  run  preceding  the  three  affix 
interpreting  programs,  it  is  necessary  under  present  conditions  to  copy 
every  item  found  in  the  dictionary  onto  an  output  tape,  even  though  it  is 
known  that  about  20?^  of  the  items  will  be  eventually  rejected  during  the 
homograph  deletion  phase  (Fig.  3-1).  These  extra  items  have  to  be  processed 
several  times  before  they  are  eliminateds  in  dictionary  look-up,  in  the 
three  analyzer  passes,  in  sorting  back  to  text  order,  and  finally  in  deleting 
homographs .  By  far  the  most  time-consuming  of  these  passes  is  the  sorting 
run. 

if  a  larger  internal  memory  were  available,  the  present  decomposition 
into  many  separate  programs  would  not  be  necessary.  The  30-word  items  could 
be  analyzed  at  the  same  time  as  they  were  being  looked  up  in  the  dictionary. 
In  the  event  that  a  homographic  set  were  looked  up,  only  the  correct  member 
or  members  of  the  set  would  be  kept.  If  all  the  members  of  the  set  were 
Incompatible,  an  artificial  30-word  item  could  immediately  be  generated  to 
indicate  that  fact.  Conceptually,  therefore,  dictionary  look-up,  the 


3-33 


analysing  runs,  and  the  homograph  delete  run  could  bo  carried  out  in  a  single 
pass,  and  indeed  this  is  possible  on  any  of  the  several  maohines  non 
avoilabJje  with  a  larger  memory  than  that  of  the  naivao  I. 

The  three  affix  analyzer  programs  have  been  made  as  ujcdform  as 

possible.  The  same  symbols  have  been  used  in  the  throe  fl^w  charts  to 

describe  the  actions  of  eaoh  progi'am  (Appendix  D)«  The  grammatical  functions, 

OB  determined  on  a  word-by-word  basis,  are  stored  in  words  24  and  27  of  the 

augmented  texts  (Fig.  3-17).  The  arrangement  of  grammatical  information 

for  nouns,  adjectives,  and  verbs  will  be  given  in  Tables  3-7  to  3-9 •  The 

17  18 

format  for  pronouns  has  been  described  by  Matejka’^'  and  Coppinger,  and 

19  20 

the  format  for  numerals  by  Hagassy.  Matejka  has  illustrated  the  format 
for  prepositions,  one  of  the  classes  of  indeclinable  words. 

If  a  30-word  item  is  found  to  be  incompatible,  that  is,  the  stem 
and  affix  of  the  text  word  do  not  correspond  to  the  dictionary  entry  that 
was  found  by  stem  comparison,  this  is  indicated  by  the  same  set  of  marks  by 
all  three  analyzer  prograns.  In  terms  of  the  idealized  stem  dictionary,  an 
incompatible  item  would  be  one  whose  reduced  lexical  attribute  vector  is 
all  zeros.  An  example  of  an  Incompatible  item  was  shown  in  the  last  example 
of  Sec.  3  of  this  chapter',  'hoflxofla”  is  identified  by  the  mood  and  tense 
coding  as  not  being  an  inflected  form  of  a  verb  paradigm.  To' eliminate  such 
a  30-word  item  from  further  consideration  in  syntactic  analysis,  the  symbol 
"INGC^AT  A«  is  put  into  word  24- 

Several  other  similar  symbols  are  used.  Since  an  indeclinable  word 
has  a  one  number  paradigm,  an  indeclinable  item  is  incompatible  if*  the  affix 
stored  in  oho  dictionary  is  not  identical  with  the  affix  of  the  text  word. 

The  symbol  ”IITC®PAT  I®  is  used  to  denote  this  condition.  Adjectives  and 


verbs  are  tested  for  voioo  (affixes  of  order  zero).  If  the  voice  coding  is 
inconsistent  with  the  affix  of  order  zero,  then  the  symbol  •’INCCUPAT  R"  is 
placed  in  word  24*  Lastly,  the  symbol  "INCOMFAT  Z"  is  placed  into  word  24 
if  the  item  belongs  to  a  class  that  cannot  be  analyzed  automatically  and  is 
indicated  by  a  class  marker  greater  than  75* 

It  should  be  noted  that  "INGOMPAT  A"  is  of  a  higher  priority  than 
”INC0!1PAT  R”j  that  is,  if  an  item  is  incompatible  in  the  sense  of  both  the 
symbols '•INC(»IPAI  A"  and  "INCCHPAT  R",  the  former  symbol  is  placed  in  word 
24.  An  item  with  a  class  marker  greater  than  75  can  be  marked  "ETGCMPAT  A” 
instead  of  ’’INCOMPAI  Z"  only  if  the  affix  of  the  word  does  not  correspond 
to  any  of  the  affixes  tested  for  in  the  various  analyzer  programs.  (See 
the  descriptions  of  the  individual  programs.)  This  priority  exists  because 
the  affixes  are  checked  first  by  the  analyzer  programs. 

Since  the  three  analyzer  programs  exist  at  present  as  separatTS 
programs  in  the  Continuous  Dictionary  Run,  they  will  be  discussed 
individually. 

A.  Noun  Analyzer  Program 

The  noun  analyzer  program  analyzes  only  noun  morphological  types, 
whose  formal  definition  is  given  by  the  letter  in  character  position  1 
of  word  3  of  a  30-word  item.  All  other  items  on  an  input  tap©  are  copied 
directly  without  modification. 

The  logic  of  the  program  is  expressed  in  a  tree  structure  (Flow  Chart 
1  of  i^ipendix  D) .  The  first  branching  within  the  tree  is  determined  by  the 
affix  of  the  noun.  The  fastest  way  of  recognizing  the  affix  is  to  compare 
the  affix  of  the  text  word  with  a  complete  list  of  affixes  that  can  occur 
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logitlmatoly  iQ  the  various  noun  olasaes.  To  reduce  the  time  spent  in  this 
search,  the  list  has  been  ordered  so  that  the  most  frequently  occurring 
affixes  appear  at  the  head  of  the  list  (see  Sec.  8). 

After  the  program  branches  on  the  affix,  an  appropriate  subtree  is 
entered.  The  usual  order  of  branching  in  the  subtree,  as  a  natter  of 
efficiency,  is  by  class  marker  and  then  by  character  positions  3  and  4  of 
the  organized  word.  To  reduce  the  number  of  instructions  in  the  program, 
the  integral  component  of  the  olass  marker  is  identified  before  the  fractional 
component.  Similarly,  the  fourth  character  position  of  the  organized  woixi 
is  usually  tested  prior  to  the  third  character  position. 

Character  position  3  describes  if  the  noun  is  animate  or  Inanimate. 
Character  position  4  divides  the  animate  or  inanimate  classes  further.  By 
this  subdivision  the  38  classes  created  by  the  class  markers  are  increased 
to  1C8,  that  is,  there  are  108  distinct  paradigm  classes  for  noun  types. 

If  the  ideallaod  dictionary  were  being  used,  there  would  be  108  different 
definitions  of  H  for  the  morphological  type  of  nouns  alone. 

Befccro  the  analysis  of  the  noun  is  started,  the  word  is  tested  for 
an  anomaloTss  stem,  which  is  signified  by  a  "1"  in  character  position  12  of 
the  organised  word.  If  a  "1"  is  found,  then  the  affix  of  the  text  word  is 
ooaparsd  with  the  affix  stored  in  the  dictiouc  y  entry.  If  there  is  no 
match,  this  means  that  the  dictionary  item  cannot  represent  the  text  word. 

The  item  is  labeled  "INGOMPAT  A"  and  the  process  is  terminated.  If  there 
is  a  match,  or  if  the  word  is  not  an  anomalous  stem,  the  analysis  of  the 
word  is  started.  Throughout  the  tree  there  are  further  tests  for  anomalous 
stems  at  various  levels  of  branching. 


3-36 


if  a  terminal  of  the  tree,  indicating  oompatibility  betneen  the  stem 
and  affix,  is  reached,  the  case  and  number  is  entered  in  word  24  of  the  30- 
word  item,  where  a  character  position  is  reserved  for  each  case  and  number 
combination  (Table  3"7)  (also  aoe  Fig.  3-17).  The  case  coding  was  chosen 
to  be  mnemonic  and  the  machine  word  is  divided  into  two  sections  to  express 
number,  the  first  six  characters  representing  the  singular  and  the  last  six 
the  plural.  The  gender  la  inserted  into  word  27  in  the  character  position 
corresponding  to  the  related  information  on  case  and  number  (Table  3-8) . 


Character 

1: 

W  -  if  nominative  singular 

Character 

2: 

G  -  if  genitive  aingular 

Gharaoter 

3: 

A  -•  if  accusative  singular 

Character 

4t 

C  -■  if  dative  singular 

Character 

5: 

I  -  if  instrumental  singular 

Character 

6: 

P  “  if  prepositional  singular 

Character 

7: 

N  -  if  nominative  plural 

Character 

8; 

G  --  if  genitive  plural 

Character 

9: 

A  -  if  accusative  plural 

Character  10; 

G  ~  if  dative  plural 

Gharaoter  11: 

I  -  if  instrumental  plural 

Character  12: 

P  -  if  prepositional  plural 

Format  of  Word  24  of  Augmented  Text  with 
Information  on  Case  and  Number  for  Noun 
and  Adjective  Morphological  Types 

TAbIS  3-7 


M  -  masculine 
F  -  feminine 
N  -  neuter 

B  -  masculine  or  neuter  (adjective  types  only) 

A  “  masculine,  ferlnine,  or  neuter 

Allowab].©  Characters  in  Word  27  of  Augmented  Text  for 
Gender  of  Noun  and  Adjective  Morphological  Types 

T^IE  3-8 
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The  umsed  oharacters  of  isords  24  and  27  are  filled  with  zaros .  These  umiaed 
characters  are  someti^ieB  changed  to  spaces  or  dashes  before  appearing  pn  output 
lists  designed  for  detailed  Hnguistio  study.  Multiple  lexical  attributes 
are  Indicated  by  the  presence  of  more  than  one  identifying  character  in 
words  24  and  27. 

A  rather  than  a  "D"  was  used  to  represent  the  dative  case  because 
the  letter  "C",  like  the  letters  "G",  "A”,  «I",  and  «P",  can  be  used 
as  an  extractor  in  the  Univao  I,  but  •’D"  cannot. 

B.  Adjoctivo  Analyser  Program 

In  addition  to  analyzing  the  adjective  morphological  types  (parti¬ 
ciples  are- listed  as  adjectives  in  the  experimental  dictionary),  which  are 
identified  by  the  letter  ”A"  in  character  position  1  of  word  3  of  a  30-word 
item,  the  adjective  analyzer  program  processes  all  the  nonproductive  morpho¬ 
logical  types  of  Russian  words,  for  example,  pronouns,  numerals,  and  prepo¬ 
sitions.  The  other  items  on  an  input  tape  are  oopied  directly,  without 
modification. 

'The  logic  of  this  program  is  expressed  in  a  tree  structure  (Flow 
Oharfc  2  of  ^pendlx  D)  similar  to  the  tree  of  the  noun  program.  After  the 
initial  anomalous  stem  test  the  first  branching  within  the  tre©  is  determined 
by  the  affix  of  the  adjective,  and  the  adjectival  affix  list  is  ordered  on 
frequency  of  oocurrenoo. 

After  the  program  branches  on  the  affix,  there  is  only  a  single 
eoaparison  on  the  integral'  component  of  the  class  marker  in  the  subtree; 

With  the  ?Jxe@ption  of  the  anomalous  stem  tests,  which  are  scattered 
throughout  the  program,  this  comparison  determines  the  grammatical  information 
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completely.  When  a  compatible  terndnal  of  the  tree  ie  reached,  the  program 
inserts  the  case  and  number  information  into  word  24  of  the  30-word  item, 
and  the  gender  into  word  27  of  the  same  item  in  a  format  idontioal  to  the 
noun  format  (Tables  3-7  and  3-8) . 

Another  set  of  marks  is  added  to  the  30-word  items  of  short  foi*m 
and  oomparatiTO  adjeotiTos.  Any  adjective  with  the  affix  is  marked 
with  a  "1",  and  any  adjective  with  the  affix  "e"  is  marked  with  a  ”2"  in 
character  position  8  of  t|ja  organized  word.  This  indicates  that  the  adjec¬ 
tive  nay  function  as  a  comparative  adverb  in  a  sontonoo.  All  short  forms 
are  narked  in  ohai’acter  position  9  of  the  organized  word.  Those  with  affixes 
"a",  or  "h”  are  narked  with  a  "1”  to  iiwilcate  that  the  adjeotivofl  nay 
function  as  verbs.  Short  forms  with  affixes  "e"  or  "o”  are  marked  with  a 
"2”  to  indicate  that  the  adjectives  nay  function  as  verbs  or  adverbs.  Forms 
with  the  affix  "w"  are  marked  with  a  "3"  to  Indicate  that  they  may  function 
as  adverbs.  The  markings  are  summarized  in  Table  3-9 •  The  main  advantage 
derived  from  this  narking  is  that  the  dictionary  need  not  be  cluttered  with 
a  largo  number  of  adverbs,  genuinely  homographic  with  adjective  entries. 

If  an  adjective  ending  in  "e®"  can  be  used  only  comparatively,  the 
mark  “HCOKPAT  EE”  is  placed  in  word  24  to  dlstingui-sh  it  from  other 

Character  8;  1  -  if  adjective  ends  in  "ee" 

2  "  if  adjective  ends  in 

Character  9:  1  ^  if  Mjective  ends  in  ''a”,  or  "h” 

2  -  if  adjective  ends  in  or  '’o" 

3  ”  if  adjective  ends  in  "u” 

Allowable  Characters  in  Character  Positions  8  and  9  of  the 
Orgeaiized  Word  for  MJectival  Morphological  Types 

TABIE  3-9 
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Inooapatible  adjectival  forms.  Such  forma  are  Inooj^atible  la  the  aenso 
that  no  case,  number,  end  gender  can  be  assigned  to  the®. 

Only  affixes  of  order  one  are  oompared  in  the  adjectival  tree,  since 
the  affixes  of  order  zero  refer  only  to  tho  voice  of  the  adjectives,  which 
is  tested  only  if  the  initial  analysis  ia  successful.  Tho  adjectival  entries 
in  the  dictionary  are  marked  to  indicate  whether  the  30-word  dictionary  items 
are  reflexive  (R),  nonreflexive  (O),  or  both  reflexivo  and  nonreflexivo 
itemo  (A) .  If  the  symbol  "H"  or  *0“  is  found  (in  character  position  7  of 
word  26),  the  affix  of  order  zero  is  checked  for  oorrespondenco .  If  the 
affix  matches,  an  "R"  or  a  "0"  is  placed  in  character  position  11  of  word 
26.  If  the  affix  does  not  match,  the  previous  grammatical  information  is 
erased  and  the  symbol  ''INCOilPAT  R"  is  put  into  word  24  instead.  If  a 
reflexive  affix  of  order  zero  is  found,  an  additional  test  is  made.  Passive 
participles  and  nonparticipial  adjectives  cannot  be  reflexive,  therefore 
character  position  10  of  tho  organized  word  is  tested  for  an  active  parti¬ 
ciple.  if  an  active  participle  is  not  found,  the  item  is  incompatible. 

It  is  Important  to  distinguish  between  the  functions  of  the  charac¬ 
ters  in  positions  7  and  11  in  the  organized  word.  Tho  character  in  position 
7  indicates  whether  a  reflexive  or  nonreflexive  adjective  is  permitted  by 
that  dictionary  entry,  whils  the  character  ia  position  11  indicates  whether 
the  adjective  is  reflexive  or  nonredflexive.  As  an  illustration,  consider 
the  typical  adjective  with  a  null  affix  of  order  zero  and  a  delta  (A)  in 
character  position  7  of  the  organized  word.  Sensing  the  delta,  the  pro^am 
does  not  cheek  whether  or  not  the  voice  of  the  adjective  is  compatlb.l0. 
Ho?fsver,  a  zero  (0)  is  placed  into  charactor  positica  11,  so  that  a  future 
progresi  can  ims^diately  sense  the  voice  of  the  adjective. 


The  last  test  of  adjectival  morphological  types  determines  whether 
a  word  such  as  noprnoj^  functions  only  as  a  noun.  In  certain  cases, 
depending  on  the  affix  and  the  animatenesa  of  the  adjective  (Table  3-10), 
the  word  cannot  be  in  the  accusative  case.  If  the  morphological  adjective 
funotiona  only  as  a  noun,  the  accusative  case  lexical  attribute  can  be 
eliminated  45?^ of  the  time. 


Animate 

Inanimate 

Case 

Case 

Affix 

Frequency 

and  Number 

Affix 

Frequency 

and  Number 

Eliminated 

Eliminated 

-oft 

8.17- 

As 

HUX 

9.1% 

Ap 

5.0 

Ap 

-oro 

6.1 

Ap 

-00 

3.9 

As 

-HX 

4.6 

-we 

2.8 

-ero 

1.3 

As 

-iifi 

2.0 

As 

21.1  io 

-k9l 

1.6 

As 

-ee 

1.3 

As 

24.770 

Total  45.8% 

Expected  Frequency  of  Occurrence  of  Affixes  TJhich  Oan  Beduce 
Ambiguity  with  Adjectives  Used  as  Nouns 

TABLE  3-10 

Since  the  pronouns  and  numerals  are  nonproductive  types,  that  is  to 
say,  there  is  a  finite  and  small  group  of  each  in  the  Russian  language,  it 
is  not  practical  to  write  a  program  to  analyze  the  words.  It  is  simpler  to 
code  the  grammatical  functions  of  these  words  directly  when  preparing  the 
30-word  items  for  the  dictionary. Uses®  words  are  therefore  storel 
and  looked  up  as  inflected  forms.  The  adjective  analyzer  simply  transposes 
the  stored  information  into  words  24  and  27  of  the  augmented  text. 


During  look-up,  indeclinable  words,  that  is,  words  with  the  letter 
in  character  position  1  of  word  3,  are  selected  on  the  basis  of  stem 


comparison  only.  The  adjective  analyser  therefore  compares  the  affixes  and 
passes  only  those  items  nhore  the  dictionary  affix  and  text  affix  match. 
Otherwise  the  symbol  ”INC0MPAT  I"  is  inserted  into  word  24* 

In  addition,  if  an  indeclinable  noun  or  an  adjectival  or  nominal 
abbreviation  is  found,  word  24  is  filled  with  "NGACIPNGACIP”  and  word  27 
with  "AAAAAAAAAAAA" ,  indicating  that  the  item  might  be  used  in  any  case, 
number,  and  gender  whatsoever. 

Co  Verb  Analyzer  Program 

Tho  verb  analyzer  program,  the  last  of  the  thi’ee  analyzer  pi’ograms, 
analyzes  only  verb  items,  whose  formal  definition  is  given  by  the  letter  '•V” 
in  character  position  1  of  word  3  of  a  30-word  item.  All  other  items  on 
an  input  tape  are  copied  directly,  without  modifications. 

The  logic  of  this  program  is  expressed  in  a  tree  structure  (Plow 
Chart  3  of  Appendix  D)  similar  to  the  tree  structures  of  the  noun  and  adjec¬ 
tive  analyzer  programs.  After  the  Initial  anomalous  stem  test,  the  first 
branching  is  determined  by  the  affix  of  the  verb,  which  is  compared  with  the 
ordered  list  of  affixes  in  the  program.  As  with  adjectives,  only  the  affixes 
of  order  one  are  compared.  For  programming  ease,  tho  subtree  entered  after 
the  first  branching  compares  first  on  the  integral  portion  of  the  class 
marker  and  then  on  the  fractional  portion  of  the  class  marker. 

If  a  verb  is  identified  as  being  in  either  the  present, or  the  futui’e 
indicative,  the  ambiguity  is  resolved  by  checking  character  position  2  of 
the  organized  word  (Table  3-11) • 

In  most  branches  of  the  logical  tree  of  the  verb  program  the  lexical 
attributes  can  be  determined  from  the  affix  and  class  marker  alone.  The 
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H  -  imperfootivo  aspaot 

S  -  perfeotivo  aspeot 

D  -  momentary  action  (perfeotive) 

U  ”  iterative  action  (laperfootivo) 

K  -  perfeotive  or  imporfobtive  aopeot 

Notation  of  Charaotor  Position  2  of  Word  26  for  Verb  Entries 

T4BIE  3-11 

tense  and  mood  coding  in  the  third  semiorganized  T»ord  is  used  as  a  cheok  to 
ensure  that  the  function  to  be  assigned  to  the  stem  is  an  allowable  function. 
In  a  few  branches,  however,  the  tense  and  mood  code  must  be  used  to  help 
determine  the  grammatical  functions.  (See  in  particular  the  subtree  of  the 
affix  "M".) 

The  mai’kings  for  verbs  differ  significantly  from  those  for  nouns  and 
adjectives  (Table  3-12) .  The  first  six  character  positions  in  word  24  are 
reserved  for  parson  and  number.  The  person  and  number  of  verbs  in  the 
present  or  future  tenses  are  indicated  by  the  appropriate  character  in  any 
one  of  the  first  six  Initial  character  positions.  Since  for  verbs  in  the 
past  tense  the  person  cannot  be  detersained  from  the  morphological  charac¬ 
teristics,  either  all  of  the  first  thi'ee  or  all  of  the  second  three  character 
positions  are  filled  to  designate  number.  For  all  verbs,  the  tense  la  given 
in  character  position  7,  the  gender  is  given  in  character  position  8,  and 
the  mood  in  character  position  9»  The  affix  of  the  verb  of  order  zero  is 
checked  to  determine  its  voice,  which  is  noted  in  character  position  10. 

The  only  type  of  essential  homography  present  within  verb  forms  is  the  dual 
interpretation  of  second  person  plural  indicative  and  plural  imperative  of 
some  verbs  ending  in  the  string  ''ot®*’  .  The  former  interpretation  is 
displayed  in  word  24,  but  an  is  inserted  into  character  position  11  to 
denote  the  homography. 
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Characters  1-6 


Option  A:  (Present  and  future” tenses) 

V  in  character  position  1  =  let  person  singul^ 
Z  "  ”  "  2  =  2nd  person  singular' 
T  II  II  II  3  =  3rd  person  singular 

V  "  "  "  4  =  lat  person  plural 
Z  "  "  ”  5  =  2nd  person  plural 
T  n  n  "  6  =  3rd  person  plural 


Option  B:  (Past  tense) 

SSS  in  character  positions  1-3  “ 

ppp  11  n  II  4-6  = 


1st,  2nd,  or  3rd  person  singular 
Ist,  2nd,  or  3rd  person  plural 


Characters  7-12 


7i  A  =  past  (tense) 

B  =  present 
C  =  futui'e 

X  =  present  or  future 

8:  M  =  masculine  (gender) 

F  =  feminine 
N  =  neuter 
A  =  sny 

9:  D  =  indicative  (mood) 

E  =  imperative 
F  =  infinitive 
G  =  gerund 

10:  R  =  reflexi-ve  (voice) 

0  =  nonreflexive 

(NOTE;  This  voice  coding  should  not  be  confused  with 
the  same  symbols  used  in  the  organized  word  where 
information  is  stored  in  advance  of  which  voice  the 
verb  can  take.  This  coding  states  the  voice  of  the 
verb  in  each  specific  occurrence.) 

11;  X  =  special  situation  among  seae  verbs  with  affix 

which  can  be  both  2nd  person  plural  indicative  and 
plural  imperative. 

12;  Hot  used 

(KOTE;  If  a  character  position  is  not  applicable,  it 
is  filled  with  a  space,  if  a  character  position  is 
used  in  the  negative  sens©  (e.g.,  not  1st  person 
singular),  it  is  filled  with  a  zero  which  is  later 
modified  to  a  dash.) 


Format  of  Word  24  of  Augmented  Text  with  Information 
on  Person,  Akuaber,  Tense,  Gender,.  Mood,  and  Voice  for  Verb  Morphological  Types 

TABIE  3-12 
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If  a  verb  passes  through  a  compatible  terminal  of  the  tree,  the  voice 
is  checked  for  compatibility.  (See  Sec.  5B  for  the  details.)  The  "R-O-A" 
marks  are  in  character  position  3  of  word  26  of  verbal  forms . 

It  is  necessary  to  note  that  character  position  3  in  word  26  and 
character  position  10  in  word  24  of  verbal  forme  correspond  to  character 
positions  7  and  11  in  the  organized  word  of  adjectival  forms.  The  reason 
for  using  different  character  positions  is  purely  historical.  At  the  time 
that  this  information  was  inserted  into  the  experimental  diotionai’y,  some 
of  the  character  positions  had  already  been  coded  with  other  information. 
These  codes  had  to  remain  frozen  to  avoid  considerable  roprogriiimming.  It 
would  be  highly  desirable  to  use  the  same  character  positlono  for  both 
adjectival  and  verbal  forms  when  reprogramming  the  dictionary  for  production 
purposes. 

The  small  set  of  verbal  forms  in  which  there  is  artlfioiol  factoring 
that  generates  a  spurious  affix  of  order  zero  (see  Sec.  4B)  haVo  to  be 
handled  in  a  special  way  by  the  verb  analyzer  program.  Before  tlio  main  tree 
is  entered,  character  position  12  of  word  26  is  checked  for  a  ”2'’.  If  it 
is  found,  and  the  text  word  has  a  non-null  affix-  of  order  zero,  the  item  is 
tested  in  a  special  tree,  since  the  affix  will  not  be  analysed  borreotly 
otherwise.  This  character  position  is  tested  again  before  the  bost  for 
reflexivity  is  carried  out,  since  any  verb  with  a  "2'*  is  nonreflexive . 

6.  Output  of  the  Continuous  Dictionary  Run 

The  following  sentence  from  one  of  the  texts  in  the  Harvard  tape 
library  will  be  used  to  illustrate  the  output  of  the  Continuous  Diotionsffy 


Run; 


"3to  $JiyKTyiipyKme8  HaropaseHne  HaetiBaeToa  obbjMHO  b  pa^OTcbtHMKO  myMOM, 
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a  OTHOCMTejifaHafl  TOMHOcTt  Hawepeicia  HCo;Ie;!^yeMD^o  Hanpflxeniifl  xapaKTQpjiayeTCfl 
BejwuKHoft  OTHomeroiH  Hanp^iceHUfl  nojiasHoro  csHrnajia  k  cpe^Heuy  KsaflpaiwiHouy 
HanpBserooo  rnywa.  Figure  3-20  showe  the  sentence  in  texthadic  format.  The 
analyzed  items  are  displayed  in  Fig.  3-21,  and  the  sentence  is  shown  in 
Fig.  3-22  in  final  form,  after  the  homographs  have  been  deleted.  All  the 
ambiguities  that  can  be  resolved  by  an  analysis  on  a  word-by-word  basis  have 
been  removed.  The  resolution  of  the  remaining  ambiguities  is  a  task  left 
to  a  more  sophisticated  program  (see  Chap.  5) • 

As  a  result  of  the  word-by-word  analysis,  the  following  information 
is  coded  in  columns  6  and  7  of  the  texthadic  format  (Fig.  3-22);  The  pronoun 
"oTo”,  the  adjective  "^JiyKTynpyiomee",  and  the  noun  "nanpaseHne"  are  neuter 
and  either  nominative  singular  or  accusative  singular.  The  adjective,  in 
addition,  can  function  adverbially.  The  verb  "HasHBaeTcn"  is  third  person 
singular,  present  tense,  indicative,  and  reflexive;  while  the  gender  is 
undetermined.  Following  it  is  the  short  form  adjective  "o6khho",  that  can 
function  verbally  or  adverbially.  The  next  word,  the  preposition  "b", 
governs  the  accusative  or  the  prepositional  case.  Next  is  the  essential 
homograph  pair  of  the  noun  "paflMOTexHMKe",  as  indicated  by  the  "1"  and 
following  the  text  serial  number.  The  first  member  of  the  pair  is  prepo¬ 
sitional  singular  masculine,  while  the  second  member  is  feminine  and  either 
dative  or  prepositional  singular.  The  next  noun,  "rnywoM",  is  instrumental 
singular  masculine. 

After  the  comma  is  the  conjunction  "a",  which  precedes  the  adjective 
"oTHocMTejibHan",  which  is  nominative  singular  feminine.  The  noun  "toubdctb" 
is  either  nominative  or  accusative  singular  and  feminine,  and  the  next  one, 
"MsuepeHiie" ,  is  neuter  and  either  genitive  singular,  nominative  plural,  or 
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Sentence  from  Text  after  Dictionary  Look-i 
Fig,  3-20 
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Sentence  from  Test  after  Analyzer  Routines 
Fi€.  3-21 


Augmented  Text  of  Sample  Sentence 
Fig,  3-22 
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accueatlve  plural.  The  adjective  "^IccJIe;^yeM^^o"  is  genitive  or  accusative 
singular.  If  genitive,  it  can  be  masculine  or  neuter;  but  if  accusative, 
it  can  only  be  masculine.  The  coding  for  the  noun  ”Hanp«EdHMe"  is  like 
that  for  "nauepaiiKO" .  The  verb  "xapaKTspfisye-rofl”  is  third  person  singular, 
either  present  or  future  tense,  indicative,  reflexive,  and  the  gender  is 
undetermined.  The  following  noun,  "BexJWJiMDii" ,  is  instrumental  singular 
feminine . 

The  coding  for  the  noun  ''oTHomeHne"  is  similar  to  that  for 
"KanpnKeHMe",  which  has  been  described  previously,  while  that  for  the 
adjective  "noxosHoro"  is  similar  to  that  for  "iicojiftflyeMoro",  The  noun 
"cKTHajia"  is  genitive  singular  masculine.  The  preposition  "k",  which 
governs  the  dative  case,  follows,  preceding  the  two  adjectives  "cpeflHOuy** 
and  "KBaflpaTJWHDMy”,  which  are  dative  singular  and  either  masculine  or 
neuter.  The  next  noun,  "HanpHEeHnio”,  is  dative  singular  neuter,  and  the 

last  word,  the  noun*^'i:55nia",  is  genitive  singular  masculine. 

7* 

Of  the  seven  homograph  sets  contained  in  the  sentence,  six  were 
resolved  by  the  analyzer  programs  as  follows  (Fig.  3-22); 

The  two  dictionary  entries  for  "nssHBaoTCH"  differ  in  the  third 
character  position  of  the  organized  word.  One  entry  was  intended  for 
reflexive  forms  of  the  verb  and  the  other  entry  for  nonreflexive  forms. 

The  nomographic  pair  ''pa^noxexHWKe''  is  an  essential  homograph. 

Since  the  homograph  cannot  be  resolved  without  a  consideration  of  context, 
its  resolution  is  left  to  a  futujre  program. 

7' 

A  homograph  set  consists  of  two  or  more  dictionary  entries,  looked  up 
by  the  same  inflected  form,  that  are  successfijlly  analyzed  by  the 
analyzer  programs. 
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There  are  two  sets  of  three  homographs  referring  to  the  same  dictionary 
stems,  "myiiOM"  and  "rnyua"  ,  In  both  cases,  both  the  adjectival  and  verbal 
stems  are  inoompatible,  leaving  only  a  single  compatible  nominal  entry.  The 
adjectival  stem  is  an  example  of  a  stem  automatically  marked  by  the  anomalous 
stem  routine  (note  the  "1"  in  character  position  12  of  column  5)* 

The  next  homographlc  pair  is  resolved,  since  the  indeclinable 
dictionary  entry  refers  only  to  "oTHocHTsjibH-o"  and  not  to  "oTHOCMTejibH-an", 

The  next  homographlc  pair  is  resolved  in  the  same  manner,  the  indeclinable 
entry  referring  only  to  "nojiesn-o"  and  not  to  "nojiosH-oro^' . 

The  verbal  entry  for  "cMrnajia"  is  rejected,  since  the  affix  "a”  in 
a  verb  is  an  indication  of  a  past  tense  and  there  is  no  signal  in  column  8 
that  a  past  tense  (B3)  can  occur  with  the  stem  "ciirHaji”  . 


7.  Reliability  of  the  Harvard  Automatic  Dictionary 

The  reliability  of  the  Harvard  Automatic  Dictionary  and  of  the  look¬ 
up  routines  constituting  the  Continuous  Dictionary  Run  is  tested  periodically 

22 

by  means  of  the  output  of  Frequency  Runs .  A  list,  containing  every 
distinct  inflected  form  from  every  text  in  the  Harvard  tape  library,  together 
with  the  frequency  of  occurrence  of  each  form,  is  kept  on  tape.  (Ref. 

23  contains  a  list  of  all  texts  in  the  tape  library.)  The  latest  test. 
Frequency  Run  V,  processed  in  January  I960,  was  based  on  107,097  words  of 
text  consisting  of  14-, 698  distinct  inflected  forms. 

A  selection  from  the  output  of  the  latest  test  run  is  shown  in 
Fig.  3-23.  Several  items  of  special  interest  that  appear  on  this  excerpt 
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Sample  of  Main  Output  of  Frequency  Run  V 
Fig.  3-23 
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are  two  homograph  sots,  "ueTajui-a*'  and  “ueTRjiJiHfw"  •  a  problem  8et,^''M8'i-OB” ; 

a  misspelled  word,  "ueTajunweoji-Jix'’ j as  well  as  four  words  that  were  missing 

from  the  dictionary.  Two  of  these  missing  words  have  been  analyzed  by  the 

21 

missing  word  analyzer,  "uerajuiooTeKJinjiH-ofi”  and  "MorajuiyprJwecK-ofi'' ,  wJdle 
the  other  two,  "MeTsop-oB”  and  "ueTeop-ti" ,  could  not  be  analyzed  by  that 
routine  and  ai‘e  listed  as  missing  words. 

To  find  errors  in  the  output,  three  supplementary  lists  were  pro¬ 
duced:  a  list  of  homograph  sets  (Fig.  3-24),  a  list  of  problem  sets 
(Fig.  3-25),  and  a  list  of  all  the  incompatible  items  from  the  main  output 
sorted  by  class  (Fig.  3-26). 

Only  a  single  error  was  noted  on  the  list  of  incompatible  items. 

This  error  was  noted  again  on  the  list  of  problem  sets .  The  information 
gleaned  from  the  homograph  set  list  and  the  problem  set  list  is  sumiaarized 
in  Tables  3-13  and  3-14*  The  data  refers  to  the  distinct  inflected  forms 
as  well  as  to  the  text  occurrences,  so  that  a  clear  picture  of  the  magnitude 
of  errors  in  the  dictionary  and  the  associated  routines  can  be  discerned. 

The  homographs  that  were  found  in  the  output  of  Frequency  Run  V  have 
been  classified  into  six  groups.  The  first  and  by  far  the  largest  group 
consists  of  the  essential,  or  genuine,  homograph  sets.  One  member  of  every 
homograph  set  in  the  second  group  is  a  short  form  adjective  whose  existence 
is  questionable  but  which  has  been  left  in  the  dictionary,  since  there  is 
as  yet  no  reliable  source  of  information  on  this  subject. 

The  homograph  sets  in  the  third  group  are  due  to  duplicate  entries 
in  the  dictionsu^,  whereas  those  in  the  fourth  group  are  caused  by  coding 

^  A  problem  set  consists  of  one  or  more  dictionary  entries  which  have  been 
looked  up  by  the  same  inflected  form, and  which  all  have  been  identified 
as  incompatible  items  by  the  analyzer  programs. 
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Problsm  Sets  from  Frequery^  Run 
Fig.  3-25 
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Section  from  Incompa'bible  List.  Frequency  Run  V 
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Di'otinot 

Inf looted 

Forms 

Text 

Ooourranoeo 

1.  Essential  hoMographs 

165 

467c 

42U 

2<  Short  fora  adjeotlves 

83 

23 

360  7 

3*  Duplioatos  in  diotlonary 

61 

17 

254  5 

4*  Dictionary  coding  errors 

44 

12 

156  3 

5*  Tiords  not  in  olasaeo 

4 

1 

6 

6 •  Analyser  errors 

1 

358 

15 

5005 

358  out  of  14|698  dlatlnot  Inf  looted  forms  {2J^) 
5,005  out  of  104,097  words  of  text  (4*8^) 


Sunmsry  of  Homograph  Sot  list,  Froqueuoy  Run  V,  January  I960 
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Distinct 

Inflected 

Forms 

Text 

Ocourronoes 

1.  Words  missing  from  diction 

2.  Typographical  errors 

3.  Dictionary  coding  errors 

4*  Analyser  errors 

ary  62 

63  42 

23  15 

2  1 

150 

203  567* 

72  20 

80  22 

9  2 

364 

150  out  of  14,698  distinct  inflected  forms  (1»0^) 

364  out  of  104,097  words  of  text  (0»3^) 

Summary  of  Problem  Sots,  Frequency  Run  V,  January  I960 

TABIS  3-14 
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errors  in  the  dictionary.  Homograph  sets  originating  from  words  that  cannot 
be  clMsified  into  normal  olaases  (words  with  class  markers  greater  than  75) 
are  listed  in  the  fifth  group,  while  the  last  group  is  reserved  for  homo¬ 
graph  sets  caused  by  errors  in  the  analyzer  programs. 

While  the  homographs  in  groups  1,  2,  and  5  are  considered  essential 
at  present,  the  errors  that  caused  the  other  homograph  sets  have  been 
corrected. 

Examples  of  homograph  sets  belonging  to  the  first  five  groups  may 
be  found  in  Fig.  3-24*  The  pertinent  groups  have  been  marked  to  the  right 
of  the  column  containing  the  transliterated  Russian  word.  The  assignment 
of  the  homograph  sets  to  the  six  groups  is  self-evident,  perhaps  with  the 
exception  of  the  homograph  sets  with  the  verb  stem’'flyu”  .  This  verb  can 
exist  only  in  the  reflexive  voice,  but  the  dictionary  entry  was  not  appropri¬ 
ately  marked  in  character  position  3  of  the  organized  word. 

The  data  of  Table  3-13  indicates  that  almost  5% of  the  words 
occurring  in  the  texts  on  the  Harvard  tape  library  refer  to  homographic 
dictionary  entries.  Although  any  given  homograph  set  is  a  function  of  the 
morphological  classes  that  have  been  assigned  to  the  individual  members  of 
the  set,  and  in  that  manner  a  function  of  the  organization  of  the  Harvard 
Automatic  Dictionary,  the  latitude  allowed  the  coders  is  not  great.  It  is 
therefore  likely  that  any  other  automatic  dictionary  would  have  to  be  capable 
of  handling  homograph  sets  that  occur  with  approximately  the  same  frequency. 

In  the  present  dictionary,  fewer  than  0.57*’ of  the  words  in  texts 
refer  to  homograph  sets  due  to  errors. 

The  problem  sets  have  also  been  classified  into  groups  (Table  3-14) * 
The  first  group  consists  of  problem  sets  created  by  the  absence  of  a  text  word 
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from  the  dictionary.  If  the  atem  of  the  text  nord  is  hoHographic  with  the 
stem  of  another  Ruaaian  nord  represented  in  the  dictionary  aai  aubaequently 
rejected  by  the  analyzer  programB,  the  problen  set  oooura.  It  is  important 
to  note  that  not  every  text  word  laiBsing  from  the  dictionary  results  in  a 
problem  set.  The  majority  of  words  missing  from  the  dictionary  is,  of 
course,  not  homographic  rrith  the  stem  of  another  word.  New  words  not  homo- 
graphic  with  other  stems  are  listed  as  missing  words  with  no  English  corre¬ 
spondents  and  no  grammar  codes,  unless  such  codes  can  bo  assigned  by  the 
missing  word  analyzer. 

Another  group  of  problem  sets  is  due  to  typographical  errors,  gener¬ 
ated  when  the  text  is  being  typed  onto  a  magnetic  tape.  Here,  too,  not  every 
word  typed  erroneously  results  in  a  problem  set.  Host  appear  as  missing 
words .  A  mistyped  word  can  result  in  a  problem  set  only  in  one  of  two 
circumstances.  Either  the  typographical  error  is  in  the  affix  and  the  ana¬ 
lyzer  program  cannot  correlate  the  incorrect  affix  ?/ith  the  stem,  or  the 
error  is  in  a  stem  which  coincidentally  is  identical  to  the  stem  of  another 
dictionary  word. 

The  other  two  groups  of  problem  sets  are  due  to  dictionary  coding 
errors  and  errors  in  the  analyzer  progi’ams.  AH  such  errors  discovered 
through  reference  to  the  homograph  list  and  the  problem  set  list  have  been 
corrected . 

Examples  of  the  first  three  types  of  problem  sets  are  illustrated 

■it 

in  Fig.  3-25"  The  word  Koxb  ,  an  alternate  form  of  ko®i  ,  is  missing  from 
the  dictionary,  but  is  homographic  with  the  same  stem  from  the  forms  "kojim” 
and  ''Kornio",  the  latter  from  the  paradigm  of  KOjiOTb  .  Two  misspellings  are 


on  the  list:  "muqhho"  was  spelled  ’'s^MeHino*'  and  "Ksa^paTa”  ^as  speHed 
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"KBa/^paTo”,  The  other  two  examples  are  due  to  dictionary  errors.  The  adjec 
tive  MOKpoHHMii  (form  “mokpokhicio")  was  mieclassified  into  class  A1  instead 
of  A5,  while  the  abbreviation  xk-js  was  listed  as  being  indeclinable  when 
it  can  be  declined,  as  ’’MH-ra" . 

Problem  sets  are  created  by  text  words  extremely  rarely  (0.35(.),  and 
those  due  to  dictionary  errors  occur  even  more  seldom  (less  than  O.l/.  of 
the  time). 

8.  Frequency  of  Occurrences  of  Affixes 

Since  the  three  word  analyzer  programs  are  used  to  analyze  every 
Russian  word  of  the  noun,  adjective,  or  verb  morphological  types,  it  was 
desirable  to  resolve  several  statistical  questions  in  order  to  reduce  the 
time  involved  in  passing  through  the  logical  trees  of  these  three  programs. 

In  the  main  branch  of  each  program  the  affix  of  the  text  word  is 
compared  against  a  list  of  affixes  stored  in  memory.  If  the  affix  lists  are 
stored  in  order  of  decreasing  frequency  of  occurrence,  the  least  time  will 
be  spent  passing  through  the  trees.  Sj.nce  the  data  that  is  processed  by  the 
analyzer  programs  is  the  raw  output  of  dictionary  look-up,  the  statistics 
should  reflect  the  frequency  of  occurrence  of  all  30-word  dictionary  items, 
both  compatible  and  incompatible. 

Frequency  Run  V  has  already  been  considered  in  Sec .  7, .  where  the 
individual,  entries  have  been  studied  for  indication  of  error.  This  data  also 
has  been  reduced  to  obtain  the  desired  frequencies. 

Every  30»word  item  in  the  analyzer  output  is  compressed  until  only 
the  morphological  type,  affix,  class  marker,  an  index  whether  the  item  is 
compatible  or  incompatible,  and  the  frequency  of  occurrence  of  the  item  are 
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kept.  This  information  is  sorted  and  then  accumulated  (Table  1  of  Appendix 
E).  In  the  table  the  three  keys,  in  decreasing  order,  are  the  assigned 
morphological,  type,  the  affix,  and  the  class  marker.  The  totals  for  each 
summation  are  divided  into  compatible  and  incompatible  items .  The  totals 
for  the  affixes  v/ithin  the  major  morphological  types  have  been  sorted  by 
frequency  of  occurrence  (Tables  2  to  4  of  Appendix  E) .  This  is  the  order 
in  which  the  affixes  must  be  listed  in  the  analyzer  programs  to  reduce  the 
scanning  time. 

The  figures  in  Table  1  in  Appendix  E  have  been  summarized  further 
in  Table  3-15.  It  must  be  noted  that  there  is  not  a  one-to-one  correspondence 
between  the  figures  in  Sec.  7  and  those  of  Table  3-15,  since  a  distinct 
inflected  form  may  refer  to  more  than  one  dictionary  entry. 


Morphological 

Type 

Total  Entries 

Compatible 

Incompatible 

Noun 

35,875 

33,030 

2,845 

Indeclinable 

32,166 

27,271 

4,895 

Adjective 

24,312 

18,807 

5,505 

Verb 

22,265 

10,200 

12,065 

Pronoun 

6,225 

8,223 

2 

Numeral 

1,381 

1,276 

105 

124,224 

98,807 

25,417 

(79.57°) 

(20.57°) 

Miscellaneous 

30,012 

154,236 

Summary  of  Dictionary  Entries  Looked  Up  in  Frequency  Run  V 

TABIE  3-15 
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The  reason  for  selecting  the  analyzer  programs  and  other  related 
procedures  as  the  method  for  determining  the  compatibility  and  lexical 
attributes  of  the  various  word  types  was  based  on  the  development  of  the 
existing  experimenteil  system.  The  reduction  in  efficiency  due  to  this 
method  can  be  determined  by  studying  the  ratio  of  incompatible  items  that 
have  to  be  carried  through  up  to  the  homograph  delete  routine  in  the 
Continuous  Dictionary  Run  (Fig.  3-1).  The  20.5®)feratlo  is  an  indication 
of  the  useless  data  being  carried  through  the  several  routines.  The  necessity 
for  this  could  be  eliminated  by  more  efficient  coding  procedures  and  a  larger 
internal  memory. 

The  difficulties  caused  by  the  large  number  of  dictionary  stems  in 
each  verb  paradigm  are  pointed  out  by  the  statistic  that  almost  half  of  the 
incompatible  items  are  verbs.  The  large  number  of  stems  are  a  result  of  the 
affixes  factored  by  the  inverse  inflection  algorithm  (Sec.  2B). 

The  30,012  mlscellaaeoua  items  that  are  appended  to  the  main  list 
include  punctuation  marks,  editorial  comments  made  by  typists  during  text 
transcription,  and  words  that  were  not  found  in  the  dictionary.  A  rough 
estimate  of  the  number  of  missing  words  is  5,000.  The  missing  words  include 
many  proper  names  and  most  of  the  typographical  errors  generated  during 
transcription. 
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CHiLPTER  4 

A  MCDEL  FOR  MTDR4L  LANGUAG® 


1.  Introduction 

It  is  helpful  to  construct  a  theoretical  foundation  to  explain  the 
important  features  of  a  predictive  syntactic  analysis  technique  for  the 
Russian  language,  empirically  devised  by  Rhodes^  and  adop-ted  with  modifi¬ 
cations  at  Harvard  University  (see  Chapter  5)*  A  working  model  of  natural 
language  that  can  be  analysed  by  this  technique  is  presented  in  this 
chapter.  This  model  is  based  on  the  formalization  of  the  syntax  of 

2 

iukasiewicz '  parenthesis-free  notation  given  by  Burks,  Warren, and  Wright, 

3  L 

on  the  linguistic  model  of  Chomsky,  and  on  Oettinger's  theory  of 

5 

syntactic  analysis.  This  theory  utilizes  a  storage  device  consisting  of  a 
linear  array  of  storage  elements,  in  which  information  is  entered  and  removed 
from  one  end  only  in  accordance  with  a  “last-in-flrst-out"  principle.  Among 
programmers  this  storage  device  has  come  to  be  known  as  a  pushdown,  store. 

The  importance  of  the  pushdown  store  for  a  similar  analysis  was  recognized 
independently  by  Saraelson  and  Bauer. ^  Familiarity  with  the  Bui-ks,  Warren, 
and  Wright  paper  is  assumed  in  this  chapter. 

The  technique  of  predictive  syntactic  analysis  is  based  on  the 
observation  that  in  scanning  a  Russian  sentence  from  left  to  right,  it  is 
possible,  on  the  one  hand,  to  make  predictions  about  the  syntactic  structures 
that  occur  further  to  the  right,  and  on  the  other  hand,  to  determine  the 
syntactic  role  of  the  word  currently  being  examined  by  testing  it  against 
the  previously  made  predictions  that  it  might  fulfill.  The  predictions  are 
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etored  in  a  prediction  pool,  a  device  with  characteristicc  approximately 
those  of  a  simple  pushdown  store,  as  described  by  Oettinger.  Predictions 
are  tested  for  fulfillment  downward  from  the  top  of  the  prediction  pool,  but 
new  predictions  are  always  entered  at  the  top  of  the  pool. 

In  his  phrase  structure  model  for  the  synthesis  of  English  sentences, 
Chomsky  has  related  the  syntactic  roles  of  the  words  in  a  sentence  to  each 
other  by  a  hierarchy  of  grammatical  rules  expressed  in  the  form 


where  is  formed  from  X^,  by  the  replacement  of  a  single  symbol  of  X^^  by 
some  string  of  one  or  more  symbols.  The  vocabulary  that  characterizes  the 
terminal  strings  is  the  set  of  English  words  of  the  sentence  being  synthe¬ 
sized  (Fig.  4-1).  The  rules  for  the  derivation  of  the  sample  sentence  of 
Fig.  4-1  are  given  in  Table  4“1* 


Derivation  of  the  Sentence;  "The  man  hit  the  ball". 

Fig.  4-1 
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Sontenoe  — >  NP  +  VP 
NP  — >  T  +  N 
VP  — ♦  NP 
T  — >  the 
N  — ►  man,  ball 
V  —►hit 


NP  -  noun  phraso 
VP  -  verb  phraae 
T  -  artiole 
N  -  noun 
V  -  verb 


Rules  for  tho  Derivation  of  the  Sentences  "The  man  hit  the  ball". 

TABLE  4-1 


A  statement  in  the  Lukasiewicz'  parenthasls-free  notation,  as 
described  by  Burks,  Warren  and  Wright,  can  be  represented  by  a  tree-like 
structure,  paralleling  Chomsky's  representation  for  sentence  synthesis. 
In  the  illustration  (Fig.  4-2)  three  different  t/pos  of  characters  are 
usedj  the  monadic  functor  N,  the  dyadic  functor  A,  and  the  variables  x^. 


Representation  of  the  Formula  A  =  KX2  Nx^. 

Fig.  4-2 


The  set  of  functors  in  the  parenthesis-free  notation  is  analogous 
to  the  set  of  characters,  such  as  "NP",  "VF,  "P,  etc.,  in  the  intermediate 
language  of  phrase  structurej  the  set  of  variables  in  the  parenthesis-free 
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notation  io  analogous  to  the  set  of  characters,  the  English  words,  in  t-ho 
terminal  language. 

Oattinger's  syntactic  analysis  theory  is  based  on  the  proof  of  a 
theorem*'  for  the  algorithms  that  he  has  proposed.  Let  A,  which  repre¬ 
sents  any  formula  in  the  universe  of  formulas  to  be  analyzed,  be  split  into 
,  a  middle  4^ 

to  be  "well- formed",  while  and  are  arbitrary  residues  determined  by 
the  choice  of  The  theorem  states  that  if,  at  a  certain  point  in  the 
left-to-right  syntactic  analysis  of  A,  (1)  A^  has  been  analyzed,  (2)  the 
output  of  the  analysis  is  a  function  of  Ajj,  and  (3)  the  content  of  the 
pushdown  store  is  a  function  of  Ajj  only,  then  at  a  later  point,  after  4^ 
has  been  analyzed,  the  output  will  be  a  function  of  both  A^  and  4^,  but  the 
pushdown  store  still  will  be  the  function  of  4^  as  in  condition  (3). 

Oettinger  has  defined  a  set  of  three  parenthetic  notations:  the 
familiar  full  parenthetic  notation,  a  left  parenthetic  notation  in  which 
all  the  right  parentheses  have  been  removed  from  the  full  parenthetic 
notation,  and  a  right  parenthetic  notation  in  which  all  the  left  parentheses 
have  been  removed  from  the  full  parenthetic  notation  (Fig.  4-3).  With  the 
4^-theorem,  he  has  shown  the  feasibility  of  translating  between  the 
parenthesis-free  notation  and  any  one  of  the  several  alternative  parenthetic 
notations.  The  translation  algorithms,  which  also  yield  syntactic  analyses 
of  the  formulas,  have  the  following  interesting  properties; 

1.  The  internal  storage  consists  essentially  of  a 
single  pushdown  store. 


,  and  a  tail  A^,  such  that  A  =  AjjA^jAip.  4j^  is  asstmed 
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2.  The  input  fonnula  is  scanned  in  one  direction  only. 

Eaoh  character  in  the  input  formula  is  used  once  and 
only  once  and  in  sequence. 

3.  The  algorithms  translate  successfully  if  and  only  if  the 
input  formula  is  ?/ell-fonned. 


Full  parenthetic; 

((Xj^+X^)  •  x^)) 

Left  parenthetic: 

(~  ((xjL  +  x^  •  x^ 

Right  parenthetic: 

+  X^)  •  x^)) 

Parenthesis-free: 

N  M  x^  A  x^x^ 

Illustration  of  the  Various  Parenthetic  Notations  and  the 
Parenthesis-FTee  Notation 

Fig.  4-3 

Several  limitations  of  both  the  syntax  of  parenthesis-free 
notation  and  the  phrase  structure  grammar  led  to  the  development  of  a 
new  model.  In  a  natural  language  a  well-formed  subordinate  qualifier, 
such  as  a  phrase  or  clause,  can  be  added  to  or  taken  away  from  a  well- 
formed  sentence  with  the  resultant  sentence  remaining  well- formed.  This 
property  must  be  reflected  in  a  model.  If  a  well-formed  string  of 
characters  is  added  to  or  taken  away  from  a  well-formed  formula  in  the 
parenthesis-free  notation,  the  resultant  formula  is  not  well-formed. 
Other  difficulties  also  arise  with  the  phrase  structure  model,  which 
was  designed  from  the  point  of  view  of  sentence  synthesis  rather  than  of 
sentence  analysis. 

To  provide  a  theoretical  basis  for  the  analysis  of  natural 
language  and  to  accoimt  for  some  of  its  features,  a  new  model  of  natural 
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language,  charaoterlzed  uy  the  "essential"  fonnula  (See.  2),  which  is 
analogous  to  the  vfell-forined  formula  for  artificial  languages,  is  offered 
in  this  chapter.  In  Sec.  3  are  presented  several  algorithms,  with  a 
theorem  for  each.  Certain  fundamental  modifications  to  essential 
formulas  are  proposed  in  Sec.  4,  and  the  relationship  of  the  model  to 
natural  language  is  presented  in  Sec.  5* 

The  essential  formula  and  its  subsequent  modification  are  a  logical 
method  for  developing  a  model,  corresponding  in  several  characteristics  to 
natural  language.  This  model  is  not  unique  but  has  several  attractive 
properties. 

In  the  development  of  the  algorithms  (Secs.  3  and  4),  Iverson's 
notation  (Appendix  A)  will  be  used. 


2.  The  Essential  Formula 

The  concepts  and  notation  of  Burks,  Warren,  and  Wright  will  be  used 
wherever  possible. 

Consider  a  language  char' uterized  as  follows; 

Definition  1;  Any  finite  sequence  of  characters,  Including 
the  null  sequence,  is  a  formula. 

"A"  will  designate  the  null  formula.  In  general,  lower  case  Greek 
letters  will  signify  single  characters,  whereas  upper  case  Greek  letters  will 
signify  strings  of  characters  or  entire  formulas.  On  occasion,  formulas 
will  be  considered  as  vectors  of  characters.  The  following  terminology  will 


be  used  for  formulas; 
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Let  A  =  where  the  juxtapoeitlon  of  "'Jj”  and  denotes 
the  oonoatenatlon  of  the  formulas  and  .  Commas  to  indicate 
oonoatenated  forittulas  may  or  may  not  be  supplied. 

Definition  2t 

(a)  The  length  L(A)  of  A  is  the  number  of  oharaoters  in  A. 

(b)  The  head  h^(A)  is  the  unique  formula  ^  ,  suoh  that  if 

1  i  L(A),  then  =  ij  and  if  i  >  L(A),  then 

h^(A)  =  A. 

(o)  The  tall  t'^{A)  is  the  unique  formula 'I'  ,  suoh  that  if 
j  <  L(A),  then  )  =  Jj  and  if  j  >  L(A),  tlien 
t^(A)  =  A. 

(d)  The  proper  bead  hp(A)  is  the  unique  formula  4^,  such 
that  if  i  <  L(A),  then  L(45)  =  i. 

(e)  The  proper  tail  tp(A)  is  the  unique  formula  'J',  such 
that  if  j  <  L(A),  then  L('J' )  =  j. 

A  head  h(A)  or  a  tail  t(A)  will  be  written  without  the  superscripts 
whenever  this  simpler  notation  is  unambiguous. 

Definition  3;  Every  character  of  a  formula  is  either  a  functor 
or  a  variable 

Definition  4;  The  three  measures,  weight  (W),  degree  (D),  and 
measure  (M),  are  defined  as  follows: 


CO 

W(8)  D{S)  M(S) 

^i 

10-1 

1-n  n  n 

(n  >  0) 
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Subscripts  on  a  functor  or  a  variable  will  be  used  for  identification 
purposes  and  have  no  inherent  significance.  Superscripts  on  a  functor  will 
be  used  to  indicate  the  measure  of  the  functor. 

Definition  5;  The  weight,  degree,  and  measure  of  a  formula  are 
equal,  respecUvely,  to  the  sums  of  the  weights,  degrees, 
and  measures  of  the  characters  of  the  formula. 

Definition  6;  A  formula  A  is  essential  if  and  only  if 
M(A)  =  0  and  "  °* 

Example  1.  Let  ^ 

M(S)  =  3,  -1,  -1,  2,  -1,  -1,  -1 
“K)]  =  3,  2.  1,  3.  2.  1.  0 

Since  M(A^)  =  0  and  [^('^)]  i  0,  A  is  essential,  ^  =  Xj^Pp^x^x^F^^^x^ 
is  nonessential,  since  =  “!•  A^  =  nonessential, 

since  i4(A«)  =  1. 


Definition  7;  A  section  A  of  a  formula  A  consists  of  any  contiguous 

s 

set  of  characters  of  A  such  that  L(A^)  <  L(A).  If  L(A^)  <  L(A), 

s  s 

then  Ag  is  a  proper  section. 

Definition  8;  If  an  essential  formula  A  has  an  essential  proper 
section  A,  then  A  is  reducible .  Conversely,  if  A  has  no 
essential  proper  section  A^,  A  is  irreAjLClble . 

x^x^x^  is  an  essential  reducible  formula 
with  an  essential  proper  section  Fp^D^  that  is  irreducible. 


Example  2.  Aj^  =  F^ 


4-9 


Definition  9»  A  ia  a  positive  formula  if  and  only  if 


Lenma  1 

Bvery  essential  formula  is  a  positive  formula. 

PROOF:  Consider  an  essential  formula  A  =  AjjA^,  where  L(A^  =  n  >  0. 
Since  M(A)  =  0  and  M(Ag)  >  0,  M(A^)  <  0.  Consider  the  characters  in  A^: 

(a)  if  there  are  any  functors  in  A^, 

•  Z  -Z 

Xj£Ap  FifAp 

and  since  M(Fj^)  >  by  Def.  k>  it  follows  that 


^  b(f^)>-2]  "(Vi 


hence,  since  “M(x)  =  W(x)  by  Def.  4# 


"(v*-Z 


Xj«Aj 


Therefore, 


,)  +  ^  W(F^)  >0,  and  W(Ap  >  0. 
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(b)  if  there  are  no  functors  in  Z\j,, 

==ieAr  Xjfij 

Since  this  holds  for  every  n,  0  <  n  <  L(A),  A  must  be  positive. 

Theorem  1 

Every  essential  formula  is  either  of  the  form 
X  x^x,,  or  else  it  is  reducible. 

PROOF:  Consider  an  essential  formula  A  =  where  4^  =  •••»  ^2^ 

and  is  the  rightmost  functor  of  A.  The  measure  =  s  is  less  than 

or  equal  to  n,  for  otherwise  A  would  not  be  a  positive  formula,  as  is 

guaranteed  by  Lemma  1.  Therefore,  n  -  s  +  1  >  1,  and  there  is  a  section, 

Ag  =  which  is  essential.  If  s  =  n  and  4^  =  A  ,  then 

A  =  Ag  and  is  of  the  form  Otherwise,  A^  is  a  proper 

essential  section  of  A  (Indeed,  A  is  irreducible),  and  A  is  reducible. 

s 

Corollary  1 

Every  essential  formula  contains  at  least  one  functor. 

Corollary  2 

An  essential  formula  with  one  and  only  one  functor  is 
irreducible. 

ShgorgSL^a 

If  A  =  is  a  reducible  essential  formula,  with  a  proper 

essential  section  A^,  then  the  formula  A^  = 
from  the  removal  of  A^,  is  also  an  essential  formula. 


AgAy,  resulting 


PROOF:  M(Z^)  >  0  and  la(Ag)  =  0,  honoo  =  H(Ajj)  .  But 

M(Aj)  =  -M(AjjAg),  since  M(A)  =  0.  Hence,  M{A^)  =  and 

ll{Ag^)  =  M(Aj.)  =  0. 

Since  M(Ajj)  =  2=  0  and  A^,  h(Aj,)]  >  0, 

it  followa  that  ii(Aj,)]  -  0  and  Therefore 

is  an  essential  formula. 


Example  3n< 

(3) 

%  =  'V3*4- 


A  = 


-  M 


2 


and 


Theorem  2b 

If  A  =  Ajj^  is  an  essential  formula  and  A^  is  a  second 

essential  formula,  then  the  formula  A^  = 

frcan  the  Insertion  of  A  ,  is  an  essential  formula. 

s 


VA-  resulting 


PROOF:  Since  M(Ag)  =  0  and  M(A)  =  0,  lll(Aj.)  =  0.  Since  ^ 

-  0’  -  0- 

and  M(Ag}  -  0,  It  follows  that  k-^B’  MAp)]  =  ^min  [“M 
Therefore  is  an  essential  fonnula. 


Example  3b.  A  =  A^  =  and 


Ap  = 


Theorem  2  leads  to  the  following  definitions: 


Definition  10:  Starting  with  any  functor  in  an  essential 
formula  A,  consider  as  a  segment  Y-  shortest 

section  to  the  right  of,  and  including,  the  functor, 
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Lemina  2 

Every  essential  formula  A  has  a  segment  Z  (A)* 

PROOF:  An  indirect  proof  will  be  used.  A  contradiotion  will  be  deduced 
from  the  hypothesis  that  an  essential  formula  A  does  not  have  a  segment 
Z(A).  If  a  does  not  have  a  segment  Z(A),  then  Mj^t(A)j  >  0,  where 
h^[^t(A)]  is  any  funotor  in  A,  since  M|h^^t(A)||  >  0,  and  the  variable  is 
the  only  character  whose  measure  is  lees  than  0.  If  A  =  [h(A),  t{A)j  ,  then 
M|^h(A)j  >  0,  since  A  is  an  essential  formula.  But  H(A)  =  M[h(A)j  +  M  [t(A)j  >  0, 
providing  the  contradiction. 


Definition  11:  Let  A  =  Aj^,  Z  (^)  >  4p  •  If  the  segment  Z  (^) 
is  extracted  from  A,  then  the  result  of  the  concatenation 
of  the  residual  head  and  tail  of  A,  P(A)  = 
residue  of  the  original  formula  A.  Z(A)  and  P(A)  together 
constitute  a  reduced. set  of  the  original  essential  formula  A. 


Lemma  3 

If  A  is  an  essential  formula,  then  both  Z(A)  andP(A) 
of  every  reduced  set  are  essential  formulas. 


PROOF:  Since  the  first  character  of  Z  Is  a  functor  such  that  i![^h^(  Z)]  ^  0, 
and  since  the  variable  is  the  only  type  of  character  whose  measure  is  less 
than  zero,  then,  for  the  smallest  group  of  conti-guous  characters  to  the  right 
of  and  including  the  functor,  for  which  M( Z )  =  0,  it  follows  that 
M^^^[h(Z)]  >  0  and  that  Z  an  essential  formula. 

If  an  essential  formula  is  divided  into  a  segment  and  a  residue,  and 
the  segment  is  an'  essential  formula,  then  by  Theorem  2a  the  residue  must  be  an 


essential  formula 


Definition  12;  A  completely  reduced  get  of  an  eeaential 


formula  coneiats  of  a  set  of  irreducible  essential 
formulas  obtained  by  treating  both  the  segment  and  the 
residue  of  a  reduced  set  of  the  essential  formula  as 
essential  formulas,  and  by  iterating  the  process  of 
dividing  every  such  essential  formula  into  a  reduced  set. 


Definition  13;  A  variable  is  assoolated  with  a  functor  if  the 
variable  and  functor  are  members  of  the  same  irreducible 
essential  formula  of  a  oampletely  reduced  set. 


^  ^  reduced  set  of  A 

is  ^'2  ^^2  ^3^^^3^4^5  toother  reduced  set  of  A  is 

(2)„  p(3),,  . _ K  *  ^  •0(2). 

1  "^1 


Example  4‘ 
.(3)^  Jl) 


and  A  completely  reduced  set  of  A  is  f|'^^x^x^, 


F, 


Lemma  4 

The  completely  reduced  sot  of  an  essential  formula 
containing  one  functor  (i.e.,  an  irreducible  essential 
formula)  is  unique,  namely,  itself. 

PMX)F:  Lemma  4  is  an  immediate  consequonoe  of  Dof .  12. 

Leisaa  5 

If  an  essential  formula  A  is  divided  into  a  reduced 
set  consisting  of  a  segment  ®  residue 

P(A),  then  any  irreducible  essential  section  h  of  A 

O 

must  either  be  contained  c^Atirely  within  Y,  ^ 
entirely  outside  of  Y  • 


PROOF:  Since  Y.  ^  each  consist  of  contiguous  characters,  the  only 
alternatives  to  the  possibilities  stated  in  the  Lemma  are  that  either 
^p^^s^  =  I)  or  "that  hp(Ag)  =  t(  Z). 

The  former  is  impossible  by  Def.  10  and  Theorem  1,  since  h^(Z)  -  Fj, 

and  t  (a  )  can  contain  no  functor, 

p  s 

To  prove  that  the  latter  is  impossible,  let  Z  =  such  that 

'{'=  hp(Ag).  therefore  m[Z(A)]  >  0,  which 

contradicts  the  definition  of  a  segment. 


LeiBM..  6 


Let  an  essential  formula  A^  differ  from  an  essential 

formula  A  by  some  irreducible  essential  formula  A„ 

s 

extracted  from  ^  or  added  to  A  by  the  appropriate 
process  of  Theorem  2.  Consider  a  reduced  set  Z 
and  P(A^)  of  A^; 

(a)  If  Z(^«)  contains  A  ,  and  A  is  divided  into  a 
P  s 

reduced  set,  Z  (A)  aud  P(A) ,  such  that  either 


Z(A)  =  A  or  h^[Z(A)J  =  h^[Z(Aj.)]  =  F^,  then 

P(A^)  =  F(A),  and  the  residue,  P[Z(^)]» 

Z('!\.)  '"^hen  Ag  is  removed,  is  identical  to  Z(^)* 

(b)  If  P(A^)  contains  A^,  and  A  is  divided  into  a 

reduced  set,  Z  P(A)>  such  that 

h^iZ(A)]  =  h^[Z(A^)]  =  F^,  then 

Z(Aj,)  =  Z(A),  and  the  residue, Pj^P  (A^.)] ,  of 

P(A  )  when  A  is  removed,  is  Identical  to  P(A). 
r  E 
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F£00?t  (of  liGBSia  6a} 

(A)  Z  (A)  =»  A ,  which  ia  equivalent  to  saying  that  is  the 
functor  of  This  io  the  trivial  case  for  which  Z{A^)-= 

Z(A)  =«  A  ,  and  A  =»  P(A)  =  P(A^)» 

(B)  is  not  the  functor  of  A^.  h^[Z(A)]  =*  h^[Z(A^)]»  where 

hV*  =i‘"VZ(i ). 

*—  -"0  r 

Z(Ap)  =  [h^{Z(A)},  Ag,  t{Z(A^)}]  and  Z(A)  =  [h^{Z(A)},  t{Z(A|]  . 

Since  M(Ag)  =  0,  M[h^{Z(A)},  aJ  «  M[h^{Z(A)}]  .  Since  h[Z(A)]  =>  m[Z(A^)]  = 
M[t{Z(A)}]  =>  H[t{Z(Aj.)}] .  Also,  by  Theorem  2,  if  A  =>  A^  and  a 
then  h^[t{Z(A)}]  =  follows  that  t[Z(A)]  =»  t[Z(A^)]» 

since  both  strings  ore  identical.  P(A)  =  P(A^)  and  P[Z{Ap)]  =  Z(A)  when 
Z[Z(/V)]  =Ag., 

PBOOF:  (of  Lemma  6b) 

Since  all  the  characters  of  Z  (A^)  are  characters  of  A,  Z(A)  =  Z  (A^) 

by  Defso  8  and  10.  P(A^)  differs  froaP(A)  hy  A^.  Take  Z[P(A^)]»  suoh 

that  the  functor  of  A^  is  the  first  functor  of  Z[p(Ap)]  »  P[P(A^)]  =  P(A) 

by  Lemma  6a,  wherein  the  A,  A^,  and  A^  of  Lemma  6a  are  tho  P(A),  P(Ajj,),  and 

A  of  Lemma  6b. 
s 

Lema  7 

The  result  of  the  collection  of  a  completely  reduced  set  of 
a  segment  Z  essential  formula  A  and  of  a  completely 
reduced  set  of  the  corresponding  residue  P  of  A  is  a 
completely  reduced  sat  of  A. 


FBOOFi  The  Lemna  is  a  direct  consequence  of  D©f.  11,  Def.  12  and  Lenma  3» 
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Theorem  3 

Evory  eefiantial  formula  A  has  a  unique  oomplstoly  reduced  get. 
The  proof  of  this  theorem  auggeatB  a  teohixique  for  obtaining 
the  completely  reduced  set  of 


PfiOQPt  The  proof  is  by  induction  on  the  number  n  of  functors  in  A. 

(a)  n  =>  li  by  Lemma  4* 

(b)  n  >  li  asBumo  the  theorem  is  true  for  all  k  <  n. 

Eeduoe  A  into  a  segment  &  residue  P(A).  Consider  tbs 

irreducible  essential  segment  A_  whose  head  is  the  rightmost  functor  of  A 

P 

(Theorem  1). 

Case  1:  The  segment  Zi(A)  =  A„.  By  the  inductive  l^othesis,  the 
residue  Pq^(A),  containing  n-1  functors,  has  a  unique  completely  reduced  set* 
In  this  case  the  combination  of  this  set  with  A„  in  the  manner  of  Lemma  7 
gives  the  desired  result. 


Case  2i  The  segment  ^  ^0* 

(a)  Z2(^)  contains  A^,  which  can  be  written  as  A^  = 

such  that  ^  =  ^2^^^  ^  ~  ^3  [^2^*^^]  ’  Lemma  6a,  wherein  the  Pq^(A),  A, 

and  Ag  of  this  theorem  are  the  A,  Zl^,  and  A^  of  Lemma  6a,  P2(^)  “  ond  is 
identical  to  the  residue,  P^[P2(Z^)]»  remaining  when  a  segment, 
starting  with  the  same  functor  as  Z2{A),  is  removed  from  P^(A),  and 

(b)  P2(A)  contains  A^,  which  can  be  written  as  A^  =  Z^  [^2^^^^]  * 

such  that  4^  =  ^2^^^  ^2  ~  ^$[^2^^^]  *  Lemma  6b,  wherein  the  Pj^(A),  A, 

and  Ag  of  this  theorem  are  the  A,  and  A^  of  Lemma  6b,  Z2(Z^)  ~  ^ 


h 
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identical  to  the  segment,  Z5[Pi(^))»  starting  with  the  same  functor  as 
and  pJ?i(A)  =  4^. 


By  the  inductive  hypothesis,  each  of  and  in  (a)  and  (b)  has  a 
unique  completely  reduced  set*  It  has  been  shown  that  A^  and  4^  are  a 
reduced  set  of  P^(A),  hence  the  collection  of  their  completely  reduced 
sets  is  the  completely  reduced  set  of  P2^(A)  (Lemma  7),  which  proves  the 
theorem. 


3*  Algorithms  to  Test  for  Essential  Formulas 

The  basic  essential  formula  of  Sec.  2  bears  little  resemblance  to 
syntactic  analogues  in  any  natural  language,  so  that  additions  and 
modifications  have  to  be  made  to  the  initial  definitions  of  an  essential 
formula  to  bring  the  language  model  closer  to  natural  language.  The 
first  proposed  algorithm  provides  a  mechanism  for  testing  whether  or  not 
a  formula  is  essential  (Sec.  3A),  while  the  next  two  algorithms  make 
similar  tests  on  modified  versions  of  an  essential  formula  (Sec.  3B  and 
Sec.  3G). 

A  notation  for  paths  through  flow  diagrams  will  be  useful.  In  a 
flow  diagram,  such  as  Program  4-1,  the  expression  (x,y)  will  be  used  to 
express  any  path  starting  at  and  including  step  x  and  terminating  at  but 
not  including  step  y.  If  more  than  two  symbols,  for  example,  (x,u,v,y), 
are  used,  the  path  must  pass  through  the  intermediate  steps,  steps  u  and 
V,  in  order,  before  terminating  at  step  y.  The  expression  x/y  indicates 
that  there  is  a  direct  transfer  from  step  x  to  step  y  after  the  operation 
of  step  X.  ■  This  is  shown  in  the  diagram  by  .^n  arrow. 
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A.  The  Basic  Algorithm 

An  algorithm  is  introduced  to  test  an  arbitrary  formula  to  determine 
whether  or  not  it  is  essential.  The  algorithm,  called  Algorithm  1 
(Program  4”1)}  provides  a  mechanism  whereby  parentheses  are  placed  around 
every  segment  of  a  reduced  set  of  the  formula  on  the  same  left-to-right  pass 
that  determines  whether  the  formula  is  essential. 

The  symbols  $>0  and  represent  the  input  and  output  files  that  may 
contain  part  or  all  of  a  formula  (Table  4-2).  and  initialized 

(Steps  1  and  2).  In  step  3,  a  character  is  read  out  of  This  character 
is  Identified  either  as  a  functor  or  a  variable  in  step  4*  If  the  character 
is  a  functor,  a  left  parenthesis  and  the  functor  are  written  on  (Step  5)  • 

The  set  of  characters  comprising  the  identity  permutation  vector  v,  with 
L{±)  =  l!(F^),  is  written  on  file  '^^^'^^  forv/ard  direction  in  step  6,  after 
which  the  process  returns  to  step  3*  These  steps  will  remain  invariable,  even 
after  various  restrictions  are  applied  to  essential  foimulas. 

It  should  be  noted  that  while  and  are  read  and  written, 
respectively,  in  the  forward  direction  only  (corresponding  to  noimal  left- 
to-right  reading  and  writing),  the  prediction  pool,  is  written  in  the 
forward  direction  and  read  in  the  backward  direction,  that  is,  is  written 
from  left  to  right  but  read  from  right  to  left-  The  mechanism  of  writing  in 
the  forward  direction  and  then  reading  in  the  backward  direction  is  equivalent 
to  the  operation  of  a  pushdown  store.  The  individual  characters  written 
on  $2  ’'*'111  ^0  referred  to  as  predictions. 


*0 

Input  file  containing  arbitrary  fonnula. 

*1 

Output  file. 

*2 

Prediction  pool. 

Hindsight  file  (Algorithms  4  and  5  only) . 

c 

Current  character  under  consideration. 

b,b 

Current  prediction  or  set  of  predictions  from 
prediction  pool. 

F 

Set  of  functors. 

s(x) 

Class  to  which  variable  x  belongs. 

(Algorithms  2-5  only) 

a 

Alternative  arguments  of  current  variable# 
(Algorithms  4  and  5  only) 

q 

Possible  preferred  arguments. 

(Algorithms  4  and  5  only) 

Symbols  for  Algoritiiius  1  through  5 
TABLE  4-2 
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If  the  character  being  tested  in  step  li  is  a  variable,  the 
algorithm  proceeds  directly  to  step  7  whore  the  last  character  (or 
prediction)  stored  in  the  prediction  pool  is  read.  This  prediction 
is  one  component  of  In  step  8,  the  character  being  tested  is 

written  onto  the  output  file,  and  in  step  9,  the  prediction  that  has 
just  been  read  out  of  the  prediction  pool  is  tested  whether  or  not  it 
is  the  last  prediction  of  a  given  set,  that  is,  if  t^(4>2)  =1.  If  so, 
a  right  parenthesis  is  written  onto  the  output  tape;  in  either  case, 
the  process  returns  to  step  3. 

(3)  (2)  (2) 

E)cample  5.  n  Fj  ^x^x^x^x^Xr^.  After  analysis, 

/^\  1  r 


L  1  h 


^2  ^2 


(2) 

"3  ^3^1iJ 


After  analysis,  “ 


t'fk 


"^2  V3. 


(3)  (2)  (2) 

^2  “  Y  ^1  ^2  V3  ^3  Vf6^7- 


L^3 


A3  =  Fp^x^XgX^  F^^^x^x^  rj^^x^x^.  After  analysis. 


V? 


r  (3) 

‘3  ■  1*^1  W3 


p(2)_ 

^2 


^  I  F^^^x  X 

^3  Y?. 


Several  definitions  referring  to  algorithm  1  and  the  succeeding 
algorithms  are  introduced: 

Definition  lit:  Any  path  (3,3)  is  a  formula  cycle 
of  Algorithm  1. 

Definition  l5:  Algorithm  1  is  operable  if  and  only 
if  an  integral  number  of  formula  cycles  are 
traversed.  Algorithm  1  is  operable  for  the  null 
formula  A  . 

Definition  16;  Algorithm  1  is  effective  if  an  integral 
niunber  of  formula  cycles  are  traversed  and  if 
cj)  = 

2  final  2  initial* 
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Definition  17;  Algorithm  1  Is  strictly  effective  If  an 

Integral  number  of  formula  cycles  are  traversed,  and 

If  d)  =  *  =  (t  =  A  . 

0  final  ^2  final  ^2  Initial  ^ 


The  process  of  Algorithm  1  Is  continued  until  the  terminating 
conditions  of  either  step  3  or  step  7  are  reached.  If  the  process 
terminates  at  step  3  and  the  path  Is  strictly  effective,  then  the 
formula  Is  essential.  If  the  algorithm  Ip  not  strictly  effective  or  if 
the  process  terminates  at  step  7,  then  the  formula  Is  nonessentlal 
(Theorem  4) • 


Lemma  8 

At  step  3:  M  >  0,  where  h(A) 

represents  the  characters  of  A  tliat  have  been 
processed. 


PROOF:  In  the  analysis  of  a  character  of  A,  either  the  path  (3>6/3)  or  the 
path  (3j9,3)  must  be  followed.  If  path  {3>6/3)  is  followed,  the  character 

Is  a  functor  F^,  and  L(4^2  new^  "  old^  ^  (3,9,3)  is 

followed,  the  character  is  a  variable  x. ,  and  L(41  )  =  +  M(x. ), 

where  M(xj^)  =  -1.  But't2  initially  set  to  A,  so  that  L($2 

Q  .D . 


Lemma  9 

Algorithm  1  is  effective  for  an  Irreducible  essential 
segment  A^  of  a  formula  A  =  if  algorithm  1  is 
operable  for  A^. 
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’“.uJOF:  If  Algorithm  1  is  operable  for  Ag,  step  3  is  reached  such  that 
and  is  an  arbitrary  function  gj^Agj  of  Ag. 

^'vn-i’->Vr 

After  (3,6/3) s  =» 

^2  ~  ®[^]’  1,2, •  • ‘jn-l^n. 

The  next  n-1  paths  are  (3,9/3)  •  A,t  step  9  in  each  formula  cycle 
b  /  1.  After  n-1  formula  cycles,  at  step  3: 


^2  " 

The  next  path  is  (3,10/3):  =  Ap,  and  =  g[Ag]* 

^2  final  =  ^2  initial’  algorithm  is  effective  for  A^. 


Since 


Theorem  4  (4^-theorem  for  Algorithm  1) 

For  an  arbitrary  A  =  AgAyA^  /  A ,  where  is  an 
essential  formula.  Algorithm  1  is  effective  for 
if  Algorithm  1  is  operable  for 

PBDOFs  If  Algorithm  1  is  operable  for  Ag,  step  3  is  reached  such  that 
and  is  an  arbitrary  function  g[Agl  of  Ag. 

The  proof  is  by  induction  on  the  number  n  of  functors  of  A^(s)« 

(a)  n  =  Is  by  Lemma  9; 

(b)  n  >  Is  assume  true  for  all  k  <  n.  Consider  Z  [A(n)  =  A^, 
an  irreducible  segment,  and  P[A(n)j  =  A^^,  where  4^(s)  = 

Aj^A^  =  A(n-l)  by  Lemma  7,  such  that  A(n)  =  AgA^A^A^. 
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By  the  Inductive  hypothesis  on  for  k  ■=  n-1,  step  3  I0  reached 
such  that$Q  =  and  <t>2  “ 

By  Lemma  9,  step  3  is  reached  such  thatO^  =  and 

*^2  =  ■ 

But  now,  once  more  by  the  inductive  hypothesis  on  ^  for  k  =  n-1, 

step  3  is  reached  such  that  and$2  =  s[^]  *  ^2  final  “  ^2  initial^ 

Algorithm  1  is  effective  for 

Theorem  3 

Algorithm  1  is  strictly  effective  if  and  only  if  A  is 
an  essential  formula.  A  pair  of  parentheses  is  placed 
around  every  segment  of  a  reduced  set  of  an  essential 
formula. 

PROOF:  (A)  Sufficiency:  by  Theorem  4»  if  A^  =  Ap  =  A  and  initial  ~  ^  * 

(B)  Necessity:  will  be  shown  by  an  indirect  proof.  A  contradiction 
will  be  deduced  from  the  hypothesis  that  the  algorithm  is  strictly  effective 
for  a  formula  A  that  is  not  essential.  A  is  not  an  essential  formula  only  if  / 
either  0  or  0.  ■' 

(1)  \in[^(A)j  <  0.  There  must  be  a  longest  head  of  A 
such  that  j^h(Ai  )j  =  0  and  M(A^)  =  0.  can  =  A  .  Also,  there  must 
exist  s  ^  "  ^^i^  since  the  variable  is  the  only  character  with  a  negative 
measure.  Step  3  will  be  reached  such  that: 

%  =  Xj^,t(A),  and 

$2  =  A 


.9^ 


4-25 


where  A  =  t(A).  The  next  path  is  (3,7),  b  =  A  ,  and  tbe  path  cannot 

be  ocmpleted* 

(2)  M(A)  0  and  M^^^[h(A)]  >  0,*  since  otherwise  the  process 
fails  by  (1).  Therefore,  M(A)  >  0.  A=  h(A)  x^X2,...,Xp,  where  t^[h(A)^ 
is  the  rightmost  functor  of  A,  and  0  <  p  <  M[^h(A)j  .  Step  3  must  be  reached 
such  that: 

=  x.,Xo,«*«,x  ,  and 
0  12'  p 

1  >0/1 ,  a  A  •  •  •  •  ^a 
2  '  2'  3  q 

where  q  =  M[h(A)j  by  Lemma  8,  hence  q  >  p.  After  p  formula  cycles  (3,9/3), 
^>2  =  l,a2,. ••,a^,  where  r  =  q  -  p  /  0.  This  process  will  terminate  at 
step  3  but,  since  ^2  ^  algorithm  will  not  be  strictly  effective. 

(C)  Parentheses  placement;  Algorithm  1  is  effective  for  any 
Z(A)  by  Theorem  4,  and4>2  need  not  be  empty  when  the  initial  functor  of 
Ij(A)  is  being  analyzed.  Since  the  first  character  of  Z(A)  is  a  functor 
(Def.  10),  a  left  parenthesis  will  be  placed  to  the  loft  of  that  functor 
on'tr-^  (Step  5)*  Since  the  path  for  the  segment  is  effective,  the  last 
prediction  read  from  is  u  '’1”.  The  path  must  end  (10/3),  so  that  a 
right  parenthesis  is  placed  after  all  the  characters  of  the  segment 

have  been  written  on 

B.  Ordered  Variables 

.  To  make  the  predictions  more  meaningful.,  the  variables  have  been 
restricted  so  that  they  are  predicted  individually  and  not  merely  counted 
as  in  Algorithm  1  (Def.  18) .  In  natircal  language,  the  requirement  that  in 
a  sentence  a  subject,  predicate,  and  object  occur  in  a  given  order  is 
tantamount  to  the  res'briction  of  Def.  18. 
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Doflnltion  18;  (Dof .  5  revised)  A  formula  A  is  essential 
if  tu  d  only  if  H(A)  =  0, 

variables  associated  with  a  functor  belong  to  disjoint 
classes  in  the  order  n,n-l, . . .,2,1  specified  for  a 
functor  F'°^. 

In  Sec.  4*3C  the  restriction  will  be  relaxed  so  that  the  predictions 

need  not  be  fulfilled  in  the  same  order  as  they  are  made. 

For  example,  consider  the  irreducible  essential  formula 
fn)  n  n**l  2  1 

A  =  F  *  *  *^2^1  ‘  class  s(x)  to  which  a  variable  belongs  is 

denoted  by  an  integral  superscript,  x  . 

The  only  change  to  Algorithm  1  necessary  to  identify  an  ordered 
essential  formula  is  the  addition  of  step  8  of  Algorithm  2  (Program  4-2), 
where  the  class  to  which  the  variable  being  tested  belongs  is  compared  to 
that  of  the  last  prediction  stored  in  the  pool  (Theorems  6  and  7) . 

A  prediction  will  be  considered  fulfilled  if  it  is  identical  to  the 
class  of  a  variable. 

Example  6.  A^  =  Fp^x^  ^2^^^2  analysis, 

A  fTp(3)  3l'p(2)  21 J2)  2  1]  I";  2  1]  „ 

Aj^  =  [F^  Xj^[F2  ^2i.^3  ^3^4i^5J^6^7  •' 

A  p(3)  3  M)  2  „(2)  2  2  111. 

A^  =  F^  x£  F^  x„  F^  x^x^x^x^x^  is  nonessential,  since 

A^  =  IfP^x^'F^^^x^  IfI^^x^xT***,  and  the  two  variables  associated  with  F„ 
11L2  2l334'  3 

belong  to  the  same  class. 

The  proof  of  Lemma  8  is  valid  for  Algorithm  2. 


Lecroa  IQ 


Algorithm  2  is  effective  for  an  irreducible 
esBontial  segment  of  a  formula  A  = 
if  Algorithm  2  is  operable  for  Aj,. 


PROOF:  This  proof  is  similar  to  the  proof  for  Lemma  9*  If  Algorithm  2  ia 

operable  for  Aj^,  step  3  is  reached  such  that  ^2 

arbitrary  function  gj^Ziijjj  of  Aj^. 

A  -  tji(q)  si  ii"1  2  1 
Let  A„  =  F'  'x  X  ,  •  •  'x-x, . 
a  n  n-1  2^ 


After  (3,6/3): 

^2  “  g[^|»l»2,**‘,n"l,n‘ 

The  next  n-1  formula  cycles  are  (3,10/3)*  At  step  8  of  each  formula 
cycle,  b  =  s  /  1.  After  n-1  formula  cycles,  step  3  is  reached  such  that: 

■*2  = 

The  next  formula  cycle  is  (3,11/3): 


^0  " 


*2  initial  =  ‘*’2  final’  2  1=  offeotlve  for 


Theorem  6  (A^^-theorem  for  Algorithm  2) 

9^  h,  where  is 

an  essential  formula,  Algorithm  2  is  effective 
for  A^  if  Algorithm  2  is  operable  for  Ajj. 


For  an  ai’bitrary  A  =  A^A^jA^ 


4-29 


PBflOFj  The  induotive  proof  is  parallel  to  that  of  Theorem  4  (with 
Lomaa  10  Bubetituted  for  Lennaa  9)  • 

Theorem  7 

Algorithm  2  is  atrlotly  effective  if  and  only  if  A 
is  an  essential  formula.  A  pair  of  parentheses  is 
placed  around  every  segment  of  a  reduced  set  of  the 
essential  formula. 

PROOF:  (a)  Sufficiency:  by  Theorem  6  if  Ag  =  A^  =  A  and  initial  ^  ^  ' 
(B)  Necessity:  will  be  shown  by  an  indirect  proof.  A 
contradiction  will  be  deduced  from  the  hypothesis  that  the  elgorithm  is 
strictly  effective  for  a  formula  A  that  is  not  essential.  A  is  not  an 
essential  formula  only  if: 

(2)  M(A)  /  0,  or 

(3)  the  variables  are  out  of  order. 

If  conditions  (1)  or  (2)  exist,  the  proof  is  parallel  to  proof  B  in 
Theorem  5*  If  the  variables  are  out  of  order,  there  will  be  a  step  8 
such  that  h  ^  s,  and  the  path  cannot  be  completed. 

(G)  Parentheses  placement:  the  proof  is  parallel  to  pi’oof  G  of 
Theorem  5* 

C.  Relaxation  of  Order  Restriction 

If  the  ordering  restriction  (Def.  18  and  Algorithm  2)  on  the 
variables  is  relaxed,  then  the  top  prediction  in  the  pool  need  not  be  the 
only  prediction  which  miist  be  compared  to  the  class  of  the  variable  being 
tested  (Algorithm  3)*  For  example,  if  and 
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both  Aj^  and  oan  be  considered  essential  if  the  ordering  restriction  is 
relaxed,  while  only  would  be  considered  essential  by  Algorithm  2.  This 
is  equivalent  to  a  natural  language  where  the  subject,  predicate,  and  object 
within  a  clause  are  expected  to  occur  in  a  given  order,  but  where  it  is 
possible  for  the  order  to  be  permuted. 


Definition  19:  (Defs.  5  and  18  revised)  A  formula  A  is 
essential  if  and  only  if  M(A)  =  0,  -  0, 

and  the  variables  associated  with  a  functor  belong  to 
disjoint  classes  ^  where  1  <  c  <  n  if  the  functor  is  of 
measure  n.  The  variables  may  occur  in  any  order 
whatsoever. 


In  Algorithm  3  (Program  4-3)  as  opposed  to  Algorithms  1  and  2,  it 
is  necessary  to  search  among  a  set  of  predictions  in  the  prediction  pool 
for  fulfillment  rather  than  merely  to  take  the  topmost  prediction  from  the 
pool. 

As  shown  by  the  4^-theorem,  it  is  necessary  to  fulfill  the  predictions 
of  the  rightmost  analyzed  functor  before  fulfilling  the  predictions  of  the 
other  functors  further  to  the  left.  In  Algorithm  1,  sine®  the  variables 
mere  merely  being  counted,  the  fulfillment  of  a  "1"  prediction  in  the 
prediction  pool  was  an  indication  that  the  last  variable  associated  with  a 
given  functor  had  been  found.  A  right  parenthesis  was  Inserted  on  the  output 
file  after  the  variable  was  copied.  In  Algorithm  2,  the  indication  in  the 
prediction  pool  was  also  a  “1”  because  of  the  ordering  restriction.  An  x^, 

the  "last**  variable  associated  with  a  given  functor,  could  occur  only  after  an 

n  3  2  ^n)  1 

X  had  been  found  for  the  associated  functor  .  After  the  x 
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was  identified  and  oopled,  a  right  parenthosia  could  be  written  on  the 
output  file. 

With  the  relaxed  ordering  reatriction  (Algorithm  3))  a  new  device 

must  be  introduced  to  determine  when  all  the  variables  associated  with  a 

functor  }iave  been  identified.  In  the  example  cited  previously, 

(2)  1  2  1 

A  =  F'  X  is  obviously  not  the  “last"  variable  to  be  associated 

(2) 

with  F  .  A  sentinel  must  be  inserted  into  the  prediction  pool  to-  isolate 

the  predictions  associated  with  different  functors.  All  the  predictions 

preceding  the  first  sentinel  in  the  prediction  pool  (reading  in  the  usual 

rlght-to-left  order)  are  tested,  and  any  one  of  these  can  be  fulfilled  by 

a  single  given  variable.  The  sentinel  both  restricts  the  variables  to  one 

member  of  each  class  that  can  be  fulfilled,  and  marks  the  number  of 

predictions  which  can  be  fulfilled  by  variables  associated  with  a  given 

functor,  so  that  no  more  than  n  variables  are  associated  with  a  functor  F 

For  example,  if  the  first  two  characters  of  A^  =  Fp^F^^^xJx^X^x^x^ 

have  been  ana].yzed,  ^>2  =  s,l,2,3,s,l,2,  where  s  represents  the  sentinel. 

A^  is  nonessential  since  x^  belongs  to  class  "3"  and  must  be  associated 

with  Fg*  If  no  sentinel  were  in  the  "3"  prediction  would  be  fulfilled 

(o)  (2)  1122 

by  X2.  Likewise,  if  the  first  two  characters  of  =  F^‘''F2  'x^x^x^'x^  have 
been  analyzed,  ^  s,l,2,s,l,2.  A^  is  nonessential,  since  x^  and  both 
are  associated  with  F2  and  both  belong  to  class  “1"  .  The  sentinel  pi*events 
the  second  "1“  prediction,  located  to  the  left  of  the  rightmost  sentinel, 
from  being  f ulf iJLLed  by  X2 . 

In  Algorithm  3>  the  predictions  generated  by  each  functor  are 


(n) 


considered  as  elements  of  a  vector  associated  with  that  functor.  An  end 
of  vector  symbol  that  separates  vectors  written  on  a  serial  file, is  assiuaed 
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implicitly  in  tho  operation  of  the  file.  These  end  of  vector  symbols  are 
also  used  as  the  needed  sentinels  in  Algorithm  3* 

Algorithm  2  has  been  modified  somewhat  for  this  purpose.  Steps 
1  to  6  remain  unchanged.  If  the  character  under  consideration  is  a  vai'iable, 
the  last  vector  in  the  prediction  pool  is  read  into  b  in  step  7.  In  step  8, 
the  class  to  which  the  current  variable  belongs  is  mapped  onto  b,  which 
should  contain  all  the  unfulfilled  predictions  associated  with  the  rightmost 
not  completely  analyzed  functor.  If  a  prediction  can  be  fulfilled,  the 
variable  is  written  on  the  output  file  in  step  9,  and  the  prediction  is 
removed  from  b  in  step  10.  If  there  are  other  predictions  left  in  b,  this 
is  an  indication  that  all  the  variables  associated  with  the  functor  have  been 
identified,  so  that  a  right  parenthesis  is  written  on  the  output  file  before 
the  algorithm  returns  to  step 

If  a  variable  is  being  tested  when  the  prediction  pool  is  empty,  the 
formula  is  nonessential.  If  the  prediction  of  a  variable  being  tested  cannot 
be  found  in  the  prediction  pool  (step  8),  the  formula  is  also  nonessential 
(Theorem  9) • 


Example  .  After  analysis, 

A  -  \M  2[p(2)  2L(2)  1  2]  1]  1  3]  .  _  p(3)  1  p(2)  2  JZ)  2  112  3 

\  iFf  x^[r2  x^^F^  x^x^Jx^Jx^x^  J.  Ag  -  Fj^  x^,  F^  x^  x^x^x^x^x^. 

After  analysis, 

=  F^^x^  F^^x^  F^^^x^x^x^x^x^.  is  nonessential,  since 
A,,  =  I  Fp^x^  F^^^x^  F^^^x^x^*.*  and  the  two  variahlfis  associated  with  belong 

(3)  2  (2)  3211 

to  the  same  class.  A,  =  F^  'Xt  F'  'xix^x.x.o  A,  is  nonessential,  since 

411223454 
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^4  ~  '*  ^  vector  belonging  to  class  “3"  is  associated 

(2) 

with  a  functor  F'  . 

The  proof  of  Lemma  8  is  valid  for  Theorem  3* 

Lemma  11 

Algorithm  3  is  effective  for  an  irreducible  essential 
segment  of  a  formula  A  =  if  Algorithm  3  is 

operable  for  Ajj. 

PROOF:  The  proof  is  similar  to  the  proofs  of  Lemmas  9  and  10.  If 
Algorithm  3  is  operable  for  Ajj,  step  3  is  reached  such  that 
4>q  =  A^  and  <^2  is  an  arbitrary  function  g  A^  of  Z^. 

/  \  S  6  q 

Let  Ag  =  F  ^  ‘**^2^^1  ’  ^  i  5^  j»  0  8^  <  n, 

and  0  <  1  <  n. 

S  S  ^  ^*1 

After  (3,6/3);  <i>Q  =  -  ^2^’ 

= 


>2  -  {g[^  }{l,2,...,n>l,n| 


The  next  n-1  formula  cycles  are  (3,10/3) •  At  step  8,  in  each  formula 
cycle,  there  is  a  b^  =  s  where  i  5  L(b)j  also  /  A  .  After  n-1  formula 
cycles,  step  3  is  reached  such  that; 

®1 

^0  ~  ^1 


% "  {«W}’  {®i}' 
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The  next  formula  cycle  is  (3,lV3)» 


^>0  = 

*2  =  {8^]}- 

SI””®  “2  initial  =  ‘*2  final’  3  effeatlva  for  A^. 

Theorem  8  (i^- theorem  for  Algorithm  3) 

For  an  arbitrary  A  =  Wt  ji  A  ,  where  ^  is  an 
essential  formula,  Algorithm  3  is  effective  for 
^  if  Algorithm  3  is  operable  for  Ag. 


PROOF:  The  inductive  proof  is  parallel  to  that  of  Theorem  4  (with 
Lemma  11  substituted  for  Lemma  9) • 


Theorem  9 

Algorithm  3  is  strictly  effective  if  and  only  if  A 
is  an  essential  formula.  A  pair  of  parentheses  is 
placed  around  every  segment  of  a  reduced  set  of  an 
essential  formula. 

PROOF:  (a)  Sufficiency:  by  Theorem  8,  if  Ag  =  A^  =  A  and  initial  ^  ^  * 
(B)  Necessity:  will  be  shown  by  an  indirect  proof.  A 
contradiction  will  be  deduced  from  the  hypothesis  that  the  algorithm  is 
strictly  effective  for  a  formula  A  that  is  not  essential. 

A  is  not  an  essential  foinaula  only  if  either: 

(1)  03^ 

(2)  M(A)  ^  0,  or 
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(3)  thore  are  two  oi*  more  variables  belonging  to  tbe 
same  class  associated  with  one  functor,  or 

(4)  there  is  a  variable  belonging  to  class  "n®  associated 
with  a  functor  ,  where  m  <  n. 

If  conditions  (1)  or  (2)  exist, the  proof  is  parallel  to  proof  B 
of  Theorem  5*  If  condition  (3)  exists,  there  are  an  x|  and  an  ^ 

(x:^^  preceding  X2)  associated  with  one  functor  F^,  such  that  when  0  =  3^ 
after  step  10;  b^^  J  for  any  b^^  left  in  b.  When  c  =  x^*  at  step  8; 

^  =  0  and  the  path  cannot  be  completed.  If  condition  (4)  exists,  when 
is  being  tested,  there  will  be  no  ”n"  in  b  and  £  =  0,  so  that  the  path 
cannot  be  oanpleted. 

(C)  Parentheses  placement;  the  proof  is  parallel  to  proof  C  of 
Theorem  5* 

4«  Further  Modifications  to  the  Essential  Foraula 

It  has  been  assumed  in  the  model  as  developed  in  Sec.  3  that  every 
variable  is  a  member  of  only  one  class,  so  that  when  a  variable  is  being 
tested  only  this  one  class  is  tested  against  tbe  predictions  in  the  pool. 

In  this  section,  the  problem  of  a  variable  belonging  to  more  than  one  class 
wi.ll  be  considered.  This  is  analogous  in  natural  language  to  the  possibility 
of  a  word  having  more  than  one  role.  For  example,  in  Englisa,  the  word 
®water“  might  refer,  on  the  one  hand,  to  the  liquid,  in  which  case  "water” 
is  a  noun  or  on  the  other  hand,  to  the  act  of  feeding  plants,  in  which  ease 
® water®  is  a  verb. 
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The  outcome  of  the  modifications  to  be  set  forth  in  this  section 

f 

is  that  a  single  pass  through  a  formula  will  not  neoes8aril7  be  sufficient 

\ 

to  determine  whether  or  not  the  formula  is  essential.  On  occasion  it  i»ill 
be  necessary  to  make  several  passes  before  this  is  determined.  Algorithms, 
extended  from  those  of  the  last  section,  will  be  given  for  a  single  pass  of 
the  formulas  being  tested.  Analogues  of  the  theorems  of  Algorithms  1  to  3 
do  not  exist  for  a  single  pass  of  Algorithm  4*  The  development  of  an 
algorithm  that  will  control  the  iteration  of  a  sentence  is  a  fruitful 
field  for  further  research.  Meaningful  theorems  should  be  obtainable  from 
such  a  study. 

A.  Multi-class  Variables 

In  Algorithm  3>  if  each  variable  can  belong  to  only  one  class,  and 
if  a  prediction  of  that  class  is  in  a  location  in  the  prediction  pool  where 
it  can  be  fulfilled,  the  variable  being  tested  is  accepted,  and  the 
algorithm  proceeds  to  test  the  following  character.  If  there  is  no 
appropriate  prediction  that  can  be  fulfilled,  the  variable  is  not  accepted 
and  the  entire  formula  is  rejected  as  nonessential. 

To  take  into  account  the  possibility  of  a  variable  belonging  to 
more  than  one  class,  the  following  definitions  will  prove  to  be  helpful. 

't 

'  The  analysis  of  a  formula,  which  tests  each  character  in  the  order  of 
occurrence  once  and  only  one®,  is  defined  as  a  pass.  The  set  of 
passes  required  to  determine  whether  or  not  a  formula  is  essential  is 
defined  as  an  iteration. 


4-38 


Dftflnlt.lQn  20i  A  variable  can  belong  to  any  of  the 

claases  a,  p,  and  y,  where  each  of  the  olaoBea  is  an 
argument  of  x  and  each  member  of  a  set  of  classes  is 
an  a3.ternative  argument  of  x. 

Definition  21i  The  class  to  which  a  variable  with  alternative 
arguments  is  assigned  in  the  process  of  a  syntactic 
analysis  of  an  essential  formula  is  the  preferred 
argument* 

Whereas  alternative  arguments  of  a  variable  are  known  qualities  of 
the  variable  being  tested  by  the  algorithm,  the  preferred  argument  is 
selected  from  the  alternative  arguments  according  to  the  contents  of  the 
prediction  pool  at  the  time  of  the  test  (Algorithm  4) • 

If  it  is  assumed  that  there  is  no  a  priori  preference  for  any 
alternative  argument  or  for  any  prediction  in  the  pool,  then  all  tho 
alternative  arguments  are  compared  with  all  the  predictions  preceding  the 
first  sentinel  (steps  8-10).  When  all  the  possible  preferred  arguments  are 
found,  one  of  them  is  selected  arbitrarily  and  entered  on  the  output  file 
as  the  preferred  argument  (step  11).  All  others  are  recorded  onto  a 
hindsight  or  temporary  storage  file  (step  12).  The  prediction  that  was 
fulfilled  by  the  preferred  argument  is  then  removed  from  the  prediction  pool 
(step  13),  and  this  process  is  continued  with  the  next  character.  When  all 
the  variables  associated  with  a  given  functor  have  been  Identified,  a 
right  parenthesis  is  written  on  the  output  fils  (step  14)' 

This  process  must  end  rdth  one  of  the  three  terminal  conditions  of 
the  algorithm.  If  the  algorithm  is  strictly  effective,  then  the  algorithm 


Algorithm  4 
Program  4-4 
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successfully  iaentified  the  fonnula  as  an  essential  formula  and  has 
selected  a  preferred  argument  for  each  variable.  If  the  process  ends  at 
step  10,  then  the  particular  preferred  arguments  that  were  chosen  did  not 
lead  to  the  evaluation  of  an  essential  formula,  but  some  other  selection  of 

s 

a  preferred  argument  might  possibly  lead  to  the  desii'ed  evaluation.  If  the 
process  ends  either  at  stop  4  and  the  path  is  not  strictly  effective  or  at 
step  8,  the  formula  is  definitely  nonessential. 

If  there  is  a  choice  of  alternative  arguments  at  step  11,  it  is 
impossible  to  determine  whether  the  appropriate  one  is  chosen  as  the 
pi*eferred  argument.  Therefore,  even  if  a  strictly  effective  evaluation  was 
chosen,  other  alternative  evaluations  must  be  tried,  since  there  might  be  one 
or  even  more  than  one  additional  evaluation  for  which  the  algorithm  is 
strictly  effective.  Information  about  the  alternative  paths  is  available  on 
the  hindsight  file,  since  every  time  a  branching  point  in  the  analysis  occurs, 
all  the  alternative  preferred  arguments,  except  for  the  one  selected,  are 
recorded  there. 


Example  8.  ^  There  are  two  possible  analyses  of 

4^3^.  Either  ]or  =  [f^^^x^x^].  Is  an  essential 

(2) 

formula  but  4^  is  not.  Since  there  are  no  other  alternative  evaluations, 
a  unique  argument  can  be  assigned  to  each  variable. 

'x^’^x^.  There  are  two  analyses  that  lead  to  an  essential 

formulas  both  =  [f^^^x^x^x^]  and  -  F^^^x^x^x^j.  Therefore,  a 
unique  argument  cannot  be  assigned  to  each  variable.  A^= 

is  nonessential,  since  no  matter  what  evaluation  is  undertaken,  an  essential 


formula  cannot  be  foiind. 


If  an  algorithm  to  keep  track  of  alternative  paths  T/ers  available 
and  Algorithm.  4  could  be  applied  iteratively  until  either  all  the  possible 
combinations  of  preferred  arguments  were  tried  or  until  a  terminal  were 
reached  Indicating  that  the  formula  was  definitely  nonessential,  either 
none,  one,  or  more  than  one  of  these  combinations  would  lead  to  a 
satisfactory  interpretation.  If  none  of  the  combinations  resulted  in  an 
essential  formula,  then  the  formula  would  be  nonessential.  If  one  and  only 
one  combination  resulted  in  an  essential  formula,  the  formula  would  be 
essential  and  unique  preferred  arguments  would  have  been  assigned  to  each 
variable.  If  more  than  one  combination  resulted  in  an  essential  formulation, 
the  formula  would  be  essential  but  not  all  the  variables  could  be  assigned 
unique  preferred  arguments. 

B.  All  Predictions  Need  Not  be  Fulfilled 

It  has  previously  been  assumed  that  all  the  predictions  in  the  pool 
are  fulfilled  if  a  formula  is  essential.  However,  in  natural  language,  if, 
say,  an  object  predicolon  is  made  for  every  clause,  a  clause  without  an 
object  should  not  be  rejected. 

It  is  now  assumed  that,  although  M(Fj|^)  is  known,  there  need  not 
be  as  many  as  M  variables  associated  with  the  functor.  When  the 
alternative  arguments  of  a  variable  do  not  correspond  to  any  predictions 
remaining  in  the  pool  preceding  the  first  sentinel,  but  do  coiTespond  to  a 
prediction  following  the  first  sentinel,  this  is  now  the  only  indication 
that  all  the  variables  associated  with  a  functor  have  been  identified 
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The  most  striking  difference  between  Algorithm  4  and  Algorithm  5 
is  that  there  is  one  less  terminal  condition  in  the  latter  algorithm. 

That  a  formula  is  nonessential  can  no  longer  be  determined  on  a  single 
pass.  If  the  algorithm  ends  at  step  4  and  Is  not  strictly  effective,  or 
If  the  algorithm  terminates  at  step  8,  it  Is  merely  an  indication  that 
the  chosen  combination  of  preferred  arguments  is  not  an  essential  formula. 

Whereas  In  Algorithm  4)  if  q  =  A  '  (step  lO),  there  was  an  indication 
that  the  chosen  evaluation  did  not  lead  to  an  essential  interpretation,  in 
Algorithm  5  it  is  necessary  to  assume  that  all  the  variables  associated 
with  a  given  functor  have  been  Identified,  to  write  a  right  parenthesis  on 
the  output  file,  and  to  bring  in  the  next  set  of  predictions  from  the 
prediction  pool.  The  two  algorithms  are  otherwise  identical. 


^2^  2  1  1  1  1 

Example  9.  =  ^1^1  ^2  ^3^4*  analysis,  there  is 


(Z)  2f  31 

only  one  essential  formulation  of  “^x^x^ 

2 

is  no  X  associated  with  F2. 


11 


,  and  there 


(2)  2  (3)  3  1  12 

=  F^  Xj^  'x^x^x/  o  There  are  two  essential  formulations  of 


Either  A^  '  = 


3  4 

(1)  ,  rJ2)  2r^(3)  3  1  21 


*■1  '^1 


FJ;"x;x-x, 

2  234 


with  no  x"^  associated  with  F^,  or 


(2) 


p(2)  2L(3)  3  1 
L^l  ^1^2  2  3J 


X 


4J 


with  no  X  associated  with  Fg* 


G.  Prediction  Span  Indicator 


A  prediction  span  indicator,  a  device  not  used  in  any  of  the 
algorithms,  can  be  assigned  to  each  type  of  prediction  to  indicate 
whether  or  not  an  algorithm  is  leading  to  a  nonessential  solution. 

An  analogous  situation  in  natural  language  is  that  of  the 
prediction  of  a  genitive  modifier  by  a  noun.  Since  the  modifier  need  not 
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occur,  the  prediction  must  be  marked  so  that  if  it  remains  unfulfilled,  the 
analysis  is  not  judged  incorrect* 

This  index  specifies  whether  a  given  prediction  may  remain  unfulfilled 
in  the  analysis  of  an  essential  formula.  When  a  given  variable  Is  being 
tested,  it  is  possible  that  a  preferred  argument  cannot  be  selected  on  the 
basis  of  the  predictions  preceding  the  first  sentinel  in  the  prediction  pool. 
In  this  case,  before  the  unfulfilled  predictions  and  the  sentinel  are  wiped 
from  the  pool  (transfer (11/6) in  Program  4“5)>  so  that  the  algorithm  can 
continue  to  test  on  the  next  set  of  predictions  in  the  pool,  the  prediction 
span  indicators  of  the  unfulfilled  predictions  are  tested.  If  any  of  the 
wiped  predictions  require  fulfillment,  this  is  sufficient  indication  that  the 
selected  preferred  arguments  are  not  leading  to  an  essential  formula. 


5.  Correlation  of  the  Essential  Formula  Model  with  Natural  Language 


To  correlate  the  model  with  natural  language,  the  structure  and 
analysis  of  natural  language  will  be  put  into  abstract  algebraic  terms.  A 
sentence  in  a  natural  language  consists  of  a  finite  set  of  elements  in  a 
given  order.  Since,  in  general,  a  word  tested  out  of  context  gives  no 
Information  about  the  neighboring  words,  the  elements  of  the  sentence  may 
be  consi.dered  as  variables.  A  sentence  can  then  be  described  as  a  sequence, 
S  =  .. .,x^  ,  where  words,  punctuation  marks,  as  well  as  other  symbols 

are  its  elements. 


The  set  of  alternative  arguments  associated  with  each  word  can  be 
retrieved  from  a  dictionary  (such  as  the  Russlan-English  automatic  dictionary 
described  in  Chap.  3)*  The  information  available  for  syntactic  analysis  can 
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be  expressed  as  follows: 


S 


“l^^l^^l'***  ’  *  *  1 

Xi  ,X2  J» 


where  the  right  superscripts  represent  the  alternative  arguments  of  x^. 

It  should  be  noted  that  functors  are  not  explicit  in  this  representation 
of  a  sentence. 

The  method  of  predictive  syntactic  analysis  consists  of  the 
selection  of  a  preferred  argument  from  the  predictions  in  the  pool.  The 
arrival  at  a  syntactic  analysis  of  a  sentence,  including  the  establishment 
of  relationships  among  the  words  in  the  sentence,  Implies  that  the 
“functors"  are  recognized  in  the  analysis.  The  functors  cannot  be 
determined  by  an  examination  of  the  individual  words;  their  occurrence  can 
only  be  established  from  the  preferred  argument  and  the  prediction  which 
selected  it. 


If  represents  the  preferred  argument  of  word  x^  selected  by 
prediction  p^  from  among  the  alternative  ai-guments  then  the 

relationship  of  the  functor  to  the  preferred  argument  and  to  the 


prediction  can  be  formalized  as  Fj  =  Fj(p^,q^j),  where  Fj,  as  a  function 
of  p^  and  q^j,  represents  the  role  played  by  x^.  in  its  environment  in  a 
particular  sentence.  An  analyzed  sentence  can  then  be  represented  by 


<  i,  where  is  the  index  of  tbe  variable  making  the  prediction  that 
has  selected  the  preferred  argUHient.  The  preferred  argument  is  denoted  by 
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the  right  superscript,  £Lnd  the  prediction  selecting  the  preferred  argument 
is  denoted  by  the  left  superscript* 

Each  word  in  a  sentence  now  has  two  functions:  (1)  it  assumes  the 
role  of  a  variable  in  fulfilling  a  prediction  previously  placed  in  the 
prediction  poolj  and  (2)  as  a  function  of  p^  and  it  assumes  the  role 
of  a  functor,  and  makes  further  predictions  which  will  be  placed  in  the 
prediction  pool. 

The  representation  of  an  analyzed  sentence  is  an  attempt  to  illus¬ 
trate  what  is  known  about  the  sentence  and  its  individual  words  as  it  is 
being  analyzed.  Obviously,  some  information  is  known  about  a  sentence  before 
the  analysis  of  the  sentence  even  begins.  For  example,  every  sentence  is 
expected  to  have  a  subject  and  a  predicate  as  well  as  a  period  or  some  other 
punctuation  mark  denoting  its  completion.  An  initial  symbol  is  introduced 
to  denote  this  information  so  that 


To  complete  the  correlation  of  the  essential  formula  model  with  this 
notion  of  a  natural  language,  a  linkage  or  merger  of  every  functor  with  the 
immediately  preceding  variable  is  hypothesized.  The  variable  then  becomes 
the  representation  for  a  word,  and  the  functor  becomes  part  of  this  repre¬ 


sentation  and  need  not  be  considered  a  separate  entity.  In  Example  10,  the 
merger  of  a  functor  and  the  variable  Immediately  preceding  it  is  indicated 
by  a  pair  of  slurs,  3 . 


Example  10.  A  After  analysis 


.(2)  IL (3)  13  2  If  (1).  11  2 


^2^3^4 
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[113  2  1 

The  eentence  represented  by  this  example  would  be  [^x^,X2,XyX^,x^,x 

12  13  2 

where  x^^^  and  x^  ai'e  selected  by  the  initial  predictions;  x^,  x^,  and  x^  are 

selected  by  predictions  generated  by  x^;  and  x^  is  selected  by  a  prediction 

generated  by  x^. 

6.  Conclusions 

Although  the  formal  development  of  the  model  stems  from  several 
previously  published  papers,  the  main  inspiration  came  fran  a  careful  study 
of  Rhodes'  empirical  predictive  syntactic  analysis  technique,  as  applied  to 
Russian. 

It  is  assumed  that  the  structure  of  the  Russian  language  is  nested 
in  the  manner  of  the  Z^-theorem,  That  is,  if  a  sentence  is  interrupted  by 
a  phrase  or  clause,  the  embedded  phrase  or  clause  will  have  been  analyzed 
completely  before  the  analysis  returns  to  the  main  part  of  the  sentence. 

The  phrase  or  clause  will  have  no  effect  on  the  words  following  it.  This 
nesting  feature  was  brought  out  in  the  theorems  beginning  with  Tlieorem  2, 
where  it  was  shown  that  an  essential  segment,  a  nested  structure,  could  be 
removed  from  an  essential  formula,  leaving  the  resulting  formula  essential. 
The  unique  decomposition  theorem  (Theorem  3)  indicated  that  a  sentence  in 
the  model,  like  most  sentences  in  the  Russian  language,  could  be  decomposed 
uniquely  into  its  phrases  and  clauses. 

In  the  experimental  program  (Chap.  5)>  it  will  prove  convenient  to 
extend  the  concept  of  nesting  in  natural  language.  Individual  phrases  and 
clauses  can  be  considered  as  structures  within  which  nesting  can  occur. 

For  example,  a  clause  can  be  divided  into  three  nested  structures?  all  the 


o  ro 
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I 

I 

I 

I 


words  constituting  the  subject,  the  predicate,  and  the  object*  Each  of  / 

these  structures  might  contain  other  nested  structures.  Therefore,  if  the 
sentence  is  to  remain  grammatically  complete,  a  random  nested  structui’e 
cannot  be  removed  from  it* 

Theorems  2  and  3  also  point  out  the  main  difference  between  a  well- 
formed  formula  and  an  essential  formula.  Any  well-formed  segment  of  a 
well-fonned  formula  is  firmly  connected  to  the  larger  structure  of  the 
formula.  The  well-formed  segment  can  be  removed  and  a  simple  variable 
substituted  in  its  place,  but  some  symbol  must  remain  to  indicate  the 
presence  of  the  well-formed  segment  in  the  original  formula.  At  the  same 
time  an  essential  section  in  an  essential  formula  represents  a  structure 
completely  subordinate  to  a  variable,  which  in  turn  is  tied  to  the  larger 
structure  of  the  formula.  Whether  or  not  the  subordinate  structure  is 


present  is  immaterial. 


This  difference  can  be  best  illustrated  with  two  examples.  Consider 
the  well-formed  formula  ,  where  the  parentheses  are 

used  to  indicate  the  well-formed  segments  in  the  formula.  (The  individual 


variables  are  well-formed  formulas,  but  their  parentheses  have  been  omitted 

(2) 

for  clarity.)  The  well-formed  segment  replaced  by  a 

variable,  but  the  complete  absence  of  the  segment  with  no  substitute  v/ould 


render  4^  non-well-formed.  In  contrast,  consider  the  essential  formula 

r  (2)  r  (2)  11  (2) 

^  =  [F^  ^1L^2  ^2^3J^4J'  essential  segment  F^  ^2^3  removed 

from  the  formula  and  the  variables  x^^  and  x^  will  remain  to  satisfy  the 
predictions  from  F^^,  and  ]  still  will  be  an  essential  formula. 


I 
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A  second  assumption  made  about  the  Russian  language  was  that  the 
^ntactlc  role  of  a  nested  structure  In  the  larger  framework  in  which  it 
is  embedded  could  be  completely  determined  by  the  syntactic  role  played 
by  Its  first  word.  Exceptions  to  this  assumption  exist  in  the  language, 
but  it  seoms  that  they  occur  rarely  enough  to’  permit  their  analysis  by 
more  circuitous  methods  without  a  sacrifice  of  the  efficiency  of  the 
predictive  syntactic  method  as  a  whole.  In  the  model  this  first  word  of 
a  nested  structure  is  represented  by  the  variable  which  also  takes  on  the 
role  of  a  functor.  As  a  variable,  it  fulfills  the  role  of  the  entire 
nested  structure  in  the  larger  structure.  As  a  functor,  the  first  word 
forms  the  ties  to  bring  together  all  the  words  within  the  nested  structure. 

Such  an  assumption  cannot  be  made  consistently  about  the  English 
language.  When  the  two  Russian  noun  phrases,  6ojtt>uioi}  ^om  and  dojibrnae  flowa, 
are  compared  with  their  English  counterparts,  "the  big  house"  and  "the  big 
houses",  it  can  bo  seen  that,  in  the  Russian  phrases,  number  (singular  and 
plural,  respectively)  is  indicated  by  the  paradigmatic  forms  of  the 
adjectives,  whereas  number  is  not  indicated  by  the  adjectives  in  the 
English  phrases.  Also,  the  paradigmatic  forms  of  the  Russian  adjectives 
indicate  case,  information  that  is  completely  laclcing  in  the  English 
equi.valents.  To  determine  the  complete  specifications  of  the  English 
noun  phrases,  it  is  necessary  to  look  at  the  nouns  as  well  as  at  the 
adjectives  preceding  them. 

A  partial  verification  of  the  usefulness  of  liie  model  of  the 
essential  formula  will  be  presented  in  the  expei’imental  results  described 
in  Chap.  5* 
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CHAPTER  5 

AN  experimental  SYNTACTIC  ANALYZER 


1.  Introduction 

The  experimental  syntactic  analyzer  presented  in  this  chapter  is  a 

f 

system  that  syntactically  analyzes  Russian  sentences  by  a  left-to-rlght 
pass  utilizing  the  predictive  syntactic  analysis  technique  discussed  in  the 
preceding  chapter.  The  present  experimental  program,  which  was  written  in 
January  I960,  will  be  discussed  from  the  point  of  view  of  several  problem 
areas.  The  discussion  of  these  areas  should  provide  an  adequate  indication 
of  the  approach  of  predictive  analysis, as  well  as  the  more  pertinent  details 
of  operation,  but  no  systematic  attempt  will  be  made  to  consider  all  the 
aspects  of  the  program  in  complete  detail. 

The  various  riO.es  by  which  this  program  operates  constitute  a  veri“ 
fiable  although  incomplete  grammar  of  the  Russian  language.  Traditional 
grammars  abound  with  exceptions  to  the  rules  that  are  stated.  The  grammatical 
rules  that  are  used  in  the  .syntactic  analyzer  will  have  to  account  for  these 
exceptions  if  all  sentences  are  to  be  analyzed  by  the  program.  Thus,  it  is 
necessary  to  find  broad  rules  which  govern  the  behavior  of  the  exceptions  as 
well  as  the  more  usual  occurrences.  Through  these  rul.es,  the  main  goal  of 
the  experimental  analyzer  is  to  eliminate  any  ambiguity  in  the  syntactic  roles 
that  are  played  by  the  words  in  a  sentence.  As  the  program  is  improved,  the 
grammar  of  the  program  will  better  approximate  the  grammar  of  the  Russian 
language . 

^  1 
The  program  for  this  system  was  written  by  W.  Bosserf. 
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The  experimental  program  is  not  a  method  for  obtaining  rules,  as  is 
the  proposed  trial  translator  or  algorithm  finder  of  Giuliano.  The  only 
limitations  on  rules  to  be  utilized  in  predictive  analysis  are  that  the 
T/ords  in  the  sentence  under  analysis  must  be  scanned  in  a  left-to-right  order, 
and  that  the  predictions  must  be  stored  in  such  a  manner  as  to  adhere  to  the 
basic  nesting  characteristic  (Chapter  4)  which,  according  to  fngve's 
hypothesis,  is  applicable  to  many  natural  languages.  Within  these  con¬ 
straints,  anything  can  be  tried.  A  continuous  attempt  is  made  to  keep  the 
rules  as  systematic  as  possible  in  order  to  keep  the  data  handling  mechanism 
to  a  minimum.  The  rules  that  have  been  adopted  to  date  in  the  experimental 
program  are  due  to  a  knowledge  of  the  Russian  language  systematically 
organized  in  existing  grammars,  elicited  from  native  informants,  and  obtained 
as  a  consequence  of  earlier  experiments. 

After  new  rules  are  developed,  the  experimental  program  existing  at 
that  time  is  modified  so  that  the  new  rules  are  incorporated.  Several  texts 
are  analyzed  with  the  revised  program,  and  the  output  is  then  studied  to 
determine  whether  the  theories  expressed  by  the  new  r'iles  have  been  sub¬ 
stantiated.  There  are  usually  many  exceptions  to  new  rules.  These  exceptions 
become  obvious  when  the  new  rules  are  applied  systematically  to  several  texts, 
and  then  newer,® more  complete  rules  can  be  established. 

Many  of  the  subroutines  used  in  the  experimental  program  are  named 
after  classical  grammatical  terms,  such  as  sub.iect  prediction.  .All  of  these 
classifications  are  explicitly  defined  within  the  context  of  the  experimental 
program.,  These  definitions  need  not  coincide  with  the  classical  grammatical 
definitions,  but  they  resemble  the  classical  definitions  closely. 
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Although  it  is  not  necessary  to  analyze  many  texts  to  determine 
the  faults  and  limitations  of  a  version  of  the  experimental  program,  it 
is  dangerous  to  reprocess  the  same  texts  for  more  than  a  few  versions  of 
the  program.  If  the  same  texts  are  used  repeatedly,  the  syntactic  analysis 
program  becomes  a  program  specifically  designed  to  analyze  the  writing 
styles  of  the  several  authors  of  the  test  texts.  All  the  illustrations 
of  actual  analyzed  output  have  been  taken  from  text  OOA^,  a  text  that  was 
used  for  two  versions  of  the  experimental  program  and  is  therefore  not 
suitable  as  test  material  for  future  versions. 

In  the  discussion  of  this  chapter  it  is  essential  to  distinguish 
errors  from  mistakes.  An  error  is  a  faulty  decision  in  the  experimental 
program  which  leads  to  an  incorrect  analysis  of  a  sentence  where  the 
difficulty  is  recognized  by  some  technique  in  the  program.  A  mistake  is 
a  similar  faulty  decision  where  there  is  no  indication  that  an  Incorrect 
analysis  has  been  made. 

In  this  chapter,  the  mechanism  of  predictive  analysis  is  Introduced 
with  the  analysis  of  two  short  sentences  by  a  greatly  simplified  version 
of  the  present  program  (Sections  2  and  3)»  The  details  of  the  experimental 
program  are  presented  in  Section  4*  The  following  four  sections  (Sections  5 
through  8)  are  devoted  to  discussions  of  examples  of  output  that  demonstrate 
various  Interesting  features  of  the  programj  and  a  brief  summary  of  problems 
that  are  still  to  be  solved  is  given  in  Section  9* 

2.  An  Illustration  of  Predictive  Syntactic  Analysis 

The  method  of  predictive  syntactic  analysis  will  be  exemplified  by 
the  analysis  of  the  simple  sentence* KpacntiK  CTom  imeer  mm,  (Fig.  5“1)* 
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To  make  the  analysis  procedure  more  lucid,  a  greatly  simplified  version  of 
the  analysis  technique  is  illustrated.  The  number  of  predictions  in  the 
prediction  pool  is  reduced,  and  only  a  small  but  essential  fraction  of  the 
predictions  is  depicted.  The  experimental  system  will  be  discussed  in 
Sections  4  to  9* 

The  format  of  Fig.  5-1  is  Indicative  of  the  information  that  is 
stored  in  the  computer  memory,  although,  obviously,  in  the  memory  the 
information  need  not  be  literall  spelled  out.  Seven  concepts  introduced 
In  Chapter  4  have  been  utilized  in  this  representation. 

(1)  Alternative  argument  -  The  starting  point  of  predictive 
analysis  is  the  information  about  the  arguments  of  vjords  that  is  obtainable 
from  a  dictionary.  Since  the  lexical  poioperties  of  words  do  not  always 
define  a  unique  argument,  a  set  of  alternative  arguments  must  be  considered. 
An  alternative  argument  will  be  noted  in  this  chapter  by  a  pair  of  slashes^ 
thus,  CTOJIH  has  two  alternative  arguments,  /noun,  nominative,  plural, 
masculine/  and  /noun,  accusative,  plural,  masculine/.  This  concept  of 
argument  and  alternative  argument  is  completely  parallel  to  Definition  20 

of  Section  4.4A. 

(2)  Prediction  pool  -  The  program  analyzes  every  word  in  a 
sentence  by  attempting  to  fulfill  predictions  which  are  potential  grammatical 
x’elati  on  ships  among  the  words  of  a  sentence.  The  predictions  are  stored  in  a 
prediction  pool  which  is  operated  approximately  as  a  pushdown  store,  in  the 
sense  that  the  last  prediction  entered  into  the  pool  is  the  first  one 


tested  for  fulfillment. 
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(3)  Prediction  span  Indicator  (PSI)  -  A  prediction  span  indi¬ 
cator  is  assigned  to  each  prediction  indicating  hov/  long  the  prediction  is 
to  be  allowed  to  remain  in  the  pool.  The  prediction  span  indicators  used 
in  the  simplified  illustration  are: 

PSI  =  00  -  The  prediction  must  be  fulfilled  by  the  next 
word  in  sequence  or  not  at  all. 

PSI  =  01  -  The  prediction  must  be  fulfilled  during  the 
analysis  of  the  sentence. 

PSI  =  02  -  The  prediction  may  be  fulfilled  more  than  once 
in  a  single  sentence  and  therefore  must  never 
be  wiped  (that  is,  erased)  from  the  prediction 
pool. 

These  definitions  of  the  prediction  span  indicators  are  intended  solely  for 
the  illustration  of  the  simplified  program.  New  PSI  definitions  will  be  made 
when  the  present  experimental  program  is  discussed  in  detail  (Section  4)» 

(4)  Intersection  -  In  testing  the  alternative  arguments  of  a 
word  against  the  predictions  in  the  prediction  pool,  an  intersection  takes 
place  when  an  alternative  argument  can  fulfill  a  prediction. 

(5)  Preferred  argument  “  The  preferred  argument  is  the  alterna¬ 
tive  argument  of  the  first  intersection  in  a  test  sequence  (see  Definition  21, 

1 

Section  4«4A).  In  the  test  sequence,  all  the  alternative  arguments  of  a  word 
are  tested  against  all  the  predictions  in  the  pool  in  their  respective  orders, 
such  that  each  prediction,  in  turn,  is  tested  against  the  set  of  alternative 
arguments.  The  prediction  that  intersects  with  the  preferred  argument  becomes 
Ion. own  as  the  attributed  argument  of  the  word. 
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(6)  Hindsight  -  During  analysis,  information,  other  than  the 
preferred  argument,  that  has  to  be  stored,  is  put  onto  a  second  output  file, 
called  the  hindsight.  For  example,  if  more  than  one  alternative  argument 
intersects  with  a  prediction  in  the  pool,  all  intersecting  alternative 
arguments  but  the  first,  which  is  the  preferred  argument,  are  put  into 
hindsight. 

(7)  Chain  number  -  The  chain  number  is  an  index  that  is 
incremented  whenever  the  predictive  syntactic  analysis  program  cannot,  on 
the  basis  of  the  predictions  stored  in  the  prediction  pool,  select  a 
preferred  argument  for  a  word. 

The  first  step  in  the  program,  at  the  beginning  of  each  sentence, 
is  to  set  the  chain  number  to  zero  and  insert  an  initial  set  of  predictions 
into  the  prediction  pool  (Fig.  5“1»2).  The  PSI  and  the  source  of  the 
prediction  are  stored  with  each  prediction.  The  symbol  “INIT.”  refers  to 
the  seven  initial  predictions.  The  four  predictions  with  PSI  s  01,  subject, 
predicate  head.,  object,  and  end  of  sentence,  predict  the  corresponding 
elements  of  the  sentence  which,  for  the  purpose  of  this  example,  are  self- 
explanatory.  The  functions  of  the  other  three  predictions  will  be 
discussed  subsequently  in  Sections  3  and  5» 

Each  word  is  processed  by  the  program  in  a  three-step  cycle; 

(1)  the  alternative  arguments  of  the  word  are  placed  in  a  central  memory 
location^  (2)  each  prediction  is  tested  against  all  the  alternative 
arguments  of  the  word,  the  preferred  argument  is  identified  and  noted,  and 
the  appropriate  information  is  recorded  on  hindsight;  (3)  the"  prediction 
pool  is  updated.  An  accurate  syntactic  analysis  is  closely  tied  to  the 
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ordering  of  the  predictions  in  the  pool  and  of  the  alternative  arguments  of 
the  v/ords.  The  ordering  of  the  alternative  argumente  ie  only  secondary, 
however,  since  all  the  alternative  arguments  are  to abed  against  every 
prediction  in  turn. 

After  the  alternative  arguments  of  KpaoHirfi  are  brought  into  memory 
(Fig.  5"1*3)»  the  testing  for  intersections  begins.  The  first  intersection 
is  found  in  the  test  of  the  first  alternative  argument,  /adjective,  nominative, 
singular,  masculine/,  against  the  first  prediction,  subject.  The  preferred 
argument  (Fig.  5“1*4)  and  the  attributed  argument,  together  with  the  source 
of  the  fulfilled  subject  prediction,  are  entered  on  the  main  output  file 
(which  is  labeled  in  Fig.  5“1  “preferred  argument").  The  subject  prediction 
is  crossed  out  to  indicate  that,  since  it  has  been  fulfilled,  it  will  be  wiped 
fran  the  pool  when  the  pool  is  updated. 

The  testing  for  intersections  is  continued.  No  intersections  are 
encountered  in  the  tests  between  the  first  prediction  and  the  second  alterna¬ 
tive  argument,  the  second  prediction  and  either  alternative  argument,  and  the 
third  prediction  and  the  first  alternative  argument.  A  second  intersection  is 
discovered  between  the  object  prediction  and  the  alternative  argument,  ■ 
/adjective,  accusative,  singular,  masculine/-  Since  the  preferred  argument 
has  already  been  established,  this  intersection  is  recorded  on  the  hindsight 
file  (Fig.  5-1.5). 

It  is  necessary  to  record  the  alternate  intersections,  since  the 
selection  of  KpacH#  as  the  subject  is  made  arbitrarily,  based  only  on  the 
ordering  of  the  predictions  in  the  pool.  In  the  analysis  of  any  sentence, 
there  is  no  way  of  knowing  whether  the  arbitraix-  selection  is  the  correct 
one,  without  analyzing  the  remainder  of  the  sentence.  In  the  event  it  is 
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discovereci  later  that  the  selection  was  made  inappropriately,  the  hindsight 
will  contain  a  list  of  the  other  possible  alternatives  which  can  be 
substituted  for  the  inappropriate  one. 

The  ordering  of  the  predictions  in  the  pool  is  of  primary  importance 
in  the  analysis  of  a  sentence.  The  predictions  that  are  expected  to  be 
fulfilled  first  in  regular  sentences  are  placed  toward  the  top  of  the  pool. 
Thus  the  subject  prediction  is  above  the  predicate  head  prediction,  which, 
in  turn,  is  above  the  object  prediction.  If,  at  a  given  point  in  the 
analysis  of  a  sentence,  there  is  a  choice  of  several  predictions  which 
might  be  fulfilled,  then  the  most  likely  prediction  will  provide  the  first 
Intersection. 

After  the  second  intersection,  the  testing  for  intersections  is 
continued  once  more,  but  no  more  intersections  are  found.  After  the 
completion  of  the  testing  phase,  the  prediction  pool  is" uj^ated.  The 
fulfilled  subject  prediction  is  wiped  from  the  pool.  Every  adjectival 
preferred  argument  generates  a  master  prediction  with  PSI  =  00,  where  a 
master  is  defined  as  a  noun  or  another  adjective  following  immediately 
after  the  analyzed  adjective  and  agreeing  with  the  analyzed  adjective  in 
case,  number,  and  gender  (Fig.  5-1.6). 

Also,  after  identifying  the  subject  of  the  sentence,  it  is  possi¬ 
ble  to  modify  the  predicate  head  prediction,  since  the  predicate  must 
agree  with  the  subject  in  person,  number,  and  gender.  In  tiiis  particular 
example,  the  predicate  head  is  modified  so  that  only  a  third  person, 
singular,  masculine  predicate  can  fulfill  the  prediction. 

The  source  of  both  the  master  prediction  and  the  modified  predicate 
heM  prediction  is  listed  as  "?!D  1" ,  referring  to  the  first  word  of  the 
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sentenoe.  The  source  of  a  modified  prediction  is  always  listed  as  the 
number  of  the  last  analyzed  word  that  has  modified  the  prediction. 

The  testing  cycle  for  KpaoHu^  has  been  ocaipleted  and  a  new  cycle 
is  started  by  bringing  into  memory  the  alternative  arguments  of  the  second 
word,  the  noun  otoji  (pig.  5-1,7), 

Two  intersections  are  found  when  testing  the  alternative  arguments 
of  cToji  against  the  predictions  in  the  pool  (Fig.  5"1*8).  The  preferred 
argument  and  attributed  argument,  due  to  the  first  intersection  between  the 
master  prediction  and  the  alternative  argument,  /noun,  nominative,  singular, 
masculine/,  are  recorded  on  the  main  output  file.  The  second  intersection 
between  the  object  prediction  and  /noun,  accusative,  singular,  masculine/ 
is  posted  on  the  hindsight  file. 

The  prediction  pool  is  than  update'd  (Fig.  5-1*9).  Every  nominal 
preferred  argument  produces  a  noun  complement  prediction  which  can  be 
fulfilled  by  an  adjective  or  noun  in  the  genitive  case  following  immediately, 
after  the  analyzed  noun.  The  noun  complement  replaces  the  fulfilled  master 
prediction  at  the  top  of  the  pool.  Since  there  are  no  other  modifications 
to  the  prediction  pool,  the  alternative  argument  of  the  following  word,  the 
verb  MMeer,  is  brought  into  the  central  memory  location  (Fig.  5"1.10)» 

Only  one  intersection  is  discovered,  resulting  in  the  attributed 
argument,  predicate  head,  and  the  preferred  argument,  /verb,  third  person, 
singular,  present  tense,  Indicative,  ti’ansitive/  (Fig.  5-1. il). 

In  updating  the  prediction  pool,  the  noun  complement  together  with 
the  pi’edlcate  head  is  wiped,  since  the  PSI  of  the  foi-mer  prediction  is  00 
and  the  prediction  has  not  been  fulfilled.  Since  the  verb  is  transitive, 
the  object  prediction  can  be  modified  so  that  only  an  accusative  object  can 
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fuH’lll  the  prediction  (Fig.  5-1*12 )•  Prior  to  this  modification,  either 
an  accusative  object  or  an  instrumental  object  would  have  been  accepted. 
After  the  prediction  pool  is  updated,  the  three  alternative  argu¬ 
ments  of  the  noun  hopm  are  brought  into  the  central  memory  location  (Fig, 
5-1.13),  The  testing  for  Intersections  is  then  resumed. 

There  is  a  single  intersection  resulting  in  the  attributed  argument, 
object,  and  in  the  preferred  argument,  /noun,  accusative,  plural,  feminine/ 
(Fig,  5-1. lii).  After  this  information  is  recorded  on  the  main  output  file, 
the  prediction  pool  is  updated  once  again.  Since  the  last  analyzed  word 
had  a  nominal  preferred  argument,  a  noun  complement  prediction  is  entered 
at  the  top  of  the  pool,  (Fig,  5-1.15). 

The  single  alternative  argument  of  the  punctuation  mark,  /period/, 
is  then  brought  into  the  central  memory  location  (Fig,  5-1.16).  Testing  of 
the  alternative  argument  against  the  predictions  in  the  pool  produces  one 
intersection,  which  results  in  the  preferred  argument,  /end  of  sentence/ 
(Fig,  5-1*17).  The  prediction  pool  is  updated  for  the  last  time,  and  both 
the  noun  complement  and  the  end  of  sentence  predictions  are  wiped,  the 
former  because  its  PSI  equals  00,  Tlie  analysis  is  now  complete  (Fig,  5-1.18), 
The  results  of  this  analysis  will  now  be  reviewed.  For  every  woixi 
in  the  sentence  a  preferred  argument  has  been  selected  according  to  the  con¬ 
tents  of  the  prediction  pool.  This  is  indicated  by  the  fact  that  the  chain 
number  is  still  zero.  No  predictions  with  PSI  “  01  remain  in  the  prediction 
pool,  which  indicates  that  every  prediction  that  was  expected  to  be  fulfilled 
was  indeed  fulfil3.ed  during  the  analysis  of  the  sentence. 
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These  two  results,  chain  number  equal  to  zero  and  no  remaining 
predictions  with  PSI  =  01,  ooourring  together,  give  a  strong  indication 
that  a  correct  syntactic  analysis  of  the  sentence  has  been  obtained. 

This  is  not  meant  to  imply  that  the  analysis  is  both  unique  and  correct. 

A  stronger  indication  would  exist  if,  in  addition,  there  were  no  Information 
recorded  on  the  hindsight  file.  To  determine  whether  another  analysis  is 
feasible,  the  entire  analysis  procedure  must  be  repeated  and  the  first 
word  must  be  considered  as  the  object  of  the  sentence.  In  this  example, 
of  course,  no  alternative  analysis  is  possible. 

3.  End  Wipe  and  Arbitrary  Choice  Predictions 

The  analysis  of  the  sentence,  RpacHufi  ctoji  nweeT  Horn,  proceeded  in 
a  straightforward  manner.  The  output  of  the  program’  was  a  correct  syntactic 
analysis,  as  a  matter  of  fact,  the  only  possible  correct  analysis.  Such  a 
simple  sentence  can  always  be  correctly  analyzed  on  a  single  pass. 

The  true  merits  of  predictive  syntactic  analysis  become  evident  only 
when  the  ability  of  the  program  to  detect  errors  in  analysis  and  to  record 
clues  for  a  projected  correcting  pass  is  considered.  If  it  is  assumed  that 
(1)  the  sentence  being  analyzed  is  grammatically  correct,  so  that  there  is 
no  need  to  test  whether  or  not  the  sentence  is  grammatical,  but  only  to  find 
a  grammatical  formulation  of  the  set  of  alternative  arguments,  (2)  all  the 
words  have  been  found  in  a  dictionary  in  which  there  are  no  errors,  and  (3) 
the  words  in  the  sentence  have  not  been  missi)elled,  then  the  two  predictions, 
end  wipe  and  arbitrary  choice,  provide  a  mechanisn  for  the  detection  of 
errors  in  the  analysis.  The  rules  for  the  operation  of  these  two  predictions 
in  the  existing  program  are  as  follows; 
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(1)  End  Wipe  -  If  no  intersection  has  been  discovered  in  the 
testing  of  all  the  predictions  located  in  the  pool  above  the  end  wipe 
prediction  against  the  set  of  alternative  argume'nts  of  the  word  currently 
being  tested,  then  all  of  the  tested  predictions,  including  the  end  wipe, 
are  to  be  wiped  from  the  prediction  pool. 

For  the  purposes  of  this  simplified  example,  however,  only  such 
predictions  that  do  not  have  a  PSI  =  02  v;ill  be  wiped  from  the  prediction 
pool.  Since  the  end  wipe  prediction  itself  has  a  PSI  =02,  it  will  not  be 
vdped.  So  long  as  only  a  simple  sentence  is  considered,  the  scheme  adopted 
for  this  example  cannot  be  distinguished  from  the  one  that  is  used  in  the 
experimental  program. 

(2)  Arbitrary  Choice  -  If  no  intersection  has  been  discovered 
in  the  testing  of  all  of  the  predictions  located  in  the  pool  above  the 
arbitrary  choice  prediction  against  the  set  of  alternative  arguments  of  the 
word  currently  being  tested,  then  the  first  alternative  argument  of  the 
word  is  to  be  selected  as  the  preferred  argument,  the  attributed  argument 
arbitrary  choice  is  to  be  assigned  to  the  word,  all  other  alternative 
arguments  of  the  word  are  to  be  listed  on  the  hindsight  file,  and  the  chain 
number  is  to  be  incremented. 

The  end  wipe  prediction  se3rves  a  double  purpose  when  used  in  the 
manner  outlined.  Primarily,  it  functions  in  the  prediction  pool  as  a 
sentinel  designating  the  end  of  a  set  of  predictions  of  a  given  nested 
structure  in  the  sentence  (see  Section  4*3) •  Having  reached  this  sentinel 
with  no  previous  intersections,  it  is  assumed  by  the  program  that  the 
nested  structure  has  been  completely  analyzed,  and  the  word  being  analyzed 
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belongs  to  another  nest  In  the  sentence.  This  function  of  the  end  wipe 
prediction  is  not  self-evident  in  the  simple  example  of  this  section,  but 
will  be  pointed  out  later  when  actual  output  of  the  predictive  syntactic 
analysis  program  is  studied. 

The  second  function  of  the  end  wipe  prediction  is  to  provide  a 
mechanism  to  wipe  the  entire  prediction  pool  in  the  event  an  error  is 
discovered.  An  error  in  analysis  is  assumed  whenever  there  are  no  inter¬ 
sections  between  the  alternative  arguments  of  a  word  and  the  predictions 
in  the  pool.  Since  an  error  is  always  discovered  after  the  fact,  there  is 
a  question  as  to  which  predictions  in  the  prediction  pool  might  be  meaning¬ 
less  because  of  the  propagation  of  this  error.  Hather  than  leaving  the 
predictions  in  the  pool  and  continuing  the  possibility  of  propagating  an 
error  after  its  existence  has  been  ascertained,  the  predictions  in  the  pool 
with  several  exceptions  are  wiped  and  the  analysis  continues  with  a  clean 
slate.  The  second  function  is  actually  a  special  case  of  the  first  function 
when  all  the  predictions  in  the  pool  are  considered  as  the  nested  structure 
of  the  sentence  as  a  whole. 

The  significance  of  this  wiping  operation  is  that  whereas  any 
nested  structure,  the  beginning  of  which  has  already  been  recognized  and 
for  which  predictions  have  been  made,  will  not  be  analyzed  completely, 
complete  nested  structures,  occurring  to  the  right  of  the  word  which  causes 
the  wiping  of  the  prediction  pool,  will  be  analyzed  correctly.  For  languages 
in  which  a  (i^-theorem  holds  this  is  true,  as  has  been  proven  for  certain 
artificial  languages. 

The  predictive  syntactic  analysis  method  requires  that  a  preferred 
argument  be  selected  for  every  word.  Even  if  the  attributed  argument  is 
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an  arbitrary  ohoiosi  new  prodiotione  can  be  generated  for  the  updated 
prodlotion  pool  by  the  preferred  argument, and  so  any  nested  structure  which 
can  bo  predicted  by  the  word  labeled  arbitrary  choice  can  be  identified  on 
the  same  pass. 

Since  the  first  alternative  argument  is  arbitrarily  chosen  as  the 
preferred  argument,  in  the  event  it  is  discovered  later  that  this  choice 
was  made  in  error,  a  list  of  the  other  potential  choices  will  be  available 
in  the  hindsight  file.  If  the  alternative  arguments  are  ordered  in 
decreasing  probability  of  occurrence,  then  arbitrary  choice  preferred 
arguments  will  have  the  best  opportunity  of  being  selected  correctly.  As 
was  mentioned  earlier,  the  ordering  of  the  alternative  arguments  is  only 
secondary,  however,  since  all  the  alternative  arguments  are  tested  against 
each  prediction  in  turn.  In  the  instances  where  more  than  one  intersection 
is  found,  the  greatest  effect  on  the  selection  of  the  preferred  argument 
will  be  the  ordering  of  the  predictions  in  the  prediction  pool,as  discussed 
in  the  preceding  section.  Poor  ordering,  especially  in  the  prediction  pool, 
will  be  indicat  ed  by  frequent  wrong  analyses  on  the  first  pass  of  a  sentence 
with  the  correct  analysis  noted  on  the  hindsight  file. 

The  end  wipe  prediction  in  its  second  role  and  the  arbitrary  choice 
-  prediction  are  used  in  the  second  illustrative  example  (Fig.  5-2).  The 
same  prediction  span  indicators  are  used  in  this  example  as  in  the  example 
of  the  preceding  section.  The  same  words  in  a  rearranged  order,  corre¬ 
sponding  to  the  emphatic  statement fHorw  wweeT  KpacHbifi  oTom,  will  be  used 
(Fig.  5-2.1). 
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The  analyslp  starts  In  the  same  (SHBUgl^aB  fn  thST previous  example. 
After  laltializing  the  program  (Fig.  5-2.2),  the  alternative  arguments  of 
HorK  are  brought  into  the  central  memory  location  (Fig.  5-2.3)*  The 
attributed  argument,  subject,  and  the  preferred  argument,  /noun,  ncmdnatlve, 
plural,  feminine/,  are  assigned  to  noni  as  a  result  of  the  first  inter¬ 
section  between  the  first  prediction  and  the  second  alternative  argument 
(Fig.  5-2.4)*  The  second  and  only  other  intersection  between  the  ob j eot 
prediction  and  the  alternative  argument,  /noun,  accusative,  plural, 
feminine/,  is  noted  on  the  hindsight  file. 

The  prediction  pool  is  updated  with  the  addition  of  the  noun 
complement  prediction  after  the  subject  prediction  has  been  wiped 
(Fig.  5-2.5)*  Since  the  subject  prediction  has  been  fulfilled,  the 
predicate  head  prediction  can  be  modified  so  that  only  a  third  person, 
plural,  and  feminine  predicate  can  fulfill  the  prediction. 

The  one  alternative  argument  of  the  verb  HMeex  is  brought  into  the 
oenf..’al  memory  location  (Fig.  5-2.6)  and  is  tested  against  the  predictions 
in  the  pool.  There  is  no  intersection  with  the  noun  complement  prediction. 
Likewise,  there  is  no  intersection  with  the  predicate  head  prediction  since 
KMesT  is  singular  and  the  prediction  has  been  modified  so  that  only  a 
plural  predicate  can  fulfill  it.  Mo  intersections  are  discovered  in  testing 
the  alternative  argument  against  the  object  and  infinity  predictions.  (The 
latter  prediction  will  be  discussed  in  Section  5*) 

The  lack  of  an  intersection  is  sensed  by  the  end  wipe  prediction^ 
which  then  wipes  some  of  the  predictions  from  the  prediction  pool 
(Fig.  5-2.7).  The  predictions  for  which  PSI  =  02  are  not  wiped  (by  the 
definition  adopted  for  this  example).  Since  two  of  the  predictions, 
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predicate  head  ornL  obj  eot<  that  are  ?7iped  have  PSI  =  01,  their  wiping  is 
recorded  on  the  hindsight  fll).  The  testing  for  intersections  is  continued. 
Since  there  is  no  intersection  with  the  end  of  sentence  prediction,  the 
arbitrary  choice  prediction  selects  the  alternative  argument  as  the  preferred 
argument  and  assigns  the  attributed  argument,  arbitrary  choice,  to  HwesT . 

The  arbitrary  choice  prediction  also  increments  the  chain  number  (Fig.  5"2.8). 

Even  though  the  verb  is  transitive,  no  object  prediction  which  can 
be  modified  is  left  in  the  prediction  pool.  The  remaining  four  predictions 
are  pushed  to  the  top  of  the  prediction  pool  (Fig.  5“2.9)»  The  two 
alternative  arguments  of  KpaoHHft  are  then  brought  into  the  central  memory 
location  (Fig.  5“2.10). 

Once  more,  no  intersections  have  been  found  when  the  end  wipe 
prediction  is  being  tested.  But  since  there  are  no  predictions  in  the  pool 
that  can  be  wiped,  there  is  no  explicit  change  in  the  pool.  No  intersections 
have  been  found  when  the  arbitrai’y  choice  prediction  is  being  tested^so  that 
the  attributed  function,  arbitrary  choice,  is  assigned  to  Kpaciari}.  Since 
there  are  two  alternative  az’guraents  of  KpacHtiii,  the  first  one  is  arbitrarily 
selected  as  the  preferred  argument^  and  the  second  one  is  recorded  on  the 
hindsight  file  (Fig.  5-2.11) , 

A  master  prediction  is  entered  at  the  top  of  the  updated  prediction 
pool  since  Kpaciatii  has  an  adjectival  prefe37red  argument  (Fig.  5-2.12).  The 
alternative  arguments  of  ctoji  are  brought  into  the  central  memory  location 
(Fig.  5-2.13)  and  are  tested  against  the  predictions  in  the  pool.  A  single 
intersection  is  discovered  which  results  in  the  attributed  argument,  master 
(of  arbitrary  choice)^  and  the  preferred  argument,  /noun,  nominative,  singular, 
masculine/  (Fig.  5-2.14)  ■ 
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The  prediction  pool  is  updated  with  the  addition  of  a  noun  complement 
prediction  (Fig.  5“2.15)^  and  the  alternative  argument  of  the  punctuation 
mark  is  brought  into  tho  central  memory  location  (Fig.  5-2.16).  The 
single  intersection  resulting  In  the  preferred  argument,  /end  of  sentence/ , 
is  noted  on  the  main  output  file  (Fig.  5“2.17),  after  which  the  prediction 
pool  is  updated  for  the  last  tine  (Fig.  5-2.18).  The  noun  complement 
prediction  is  wiped  at  this  time  because  its  PSI  =  00. 

If  the  output  is  now  scanned,  the  chain  number  is  discovered  not  to  be 
equal  to  zero,  and  if  the  sentence  is,  indeed,  grammatically  correct,  it 
can  be  assumed  that  there  was  an  error  in  the  analysis.  In  the  analysis  of 
this  sentence,  the  error  can  be  identified  in  the  hindsight  by  the  alternate 
object  attributed  argument  of  Horn  and  the  wiped  object  prediction.  A 
second  pass  through  the  sentence,  assigning  the  alternative  attributed 
argument  to  Horn^would  lead  to  a  correct  syntactic  analysis. 

Although  in  the  analysis  of  the  first  word  of  the  sentence  there 
was  an  error  which  was  subsequently  discovered  when  analyzing  the  second 
word,  no  attempt  was  made  to  correct  the  error  at  that  time.  In  this 
sentence  the  error  was  obvious  and  could  have  been  corrected  immediatelyo 
But  it  is  possible  that  errors  in  other  sentences  might  not  be  so  obvious, 
and  there  might  be  several  clues  throughout  the  remainder  of  the  sentence 
that  would  aid  in  determining  the  necessary  correction.  While  continuing 
v;ith  the  analysis,  the  subordinate  nested  structure  of  the  noun  phrase, 

KpacHHM  CTOJT,  was  correctly  identified,  as  would  be  any  other  nested  structure 
that  followed  in  its  entirety  the  identification  of  the  error.  Unless  some 
evidence  suggesting  that  corrections  be  made  at  once  when  the  errors  ax-e 
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discovered  owaes  to  llght>  correction  will  be  attempted  only  after  the 
analysis  of  an  entire  sentence. 

Since  the  implications  involved  in  error  correction  are  not  yet 
clear  or  understood,  no  attempt  has  been  made  yet  to  write  such  a  program. 

To  ocnolude  the  discussion  of  this  illustration,  it  is  interesting 
to  see  what  would  have  happened  if  the  end  wipe  and  arbitrary  choice 
predictions  had  not  been  invoked  and  the  noun  complement,  predicate  head 
and  object  predictions  had  boon  allowed  to  remain  in  the  pool  after  the 
error  was  discovered.  MweeT  would  have  modified  the  object  prediction  so 
that  only  an  accusative  object  would  have  fulfilled  the  prediction.  The 
adjective  Kpacmrii  would  have  been  accepted  as  the  object  of  HMeer,  and  ctoji 
would  have  been  accepted  as  the  master  pf  (the  accusative  adjective) 

KpacHfcrii.  This  result  seems  to  be  far  less  satisfactory  than  the  one 
Illustrated. 

4.  The  Predictive  Syntactic  Analysis  Program 

The  input  to  the  predictive  syntactic  analysis  progi*am  is  a  textj, 
in  which  every  word  is  represented  by  a_line  in  the  texthadic  format 
(Fig.  5-3a)  (see  Section  3.4) •  Two  outputs,  the  main  output  file  (Pig.  5“3b) 
and  the  hindsight  file  (Pig.  5”3c),  are  produced  by  the  program.  Column  9, 
which  in  the  texthadic  format  contains  the  dictionary  entry  number,  is 
replaced  on  both  output  files  by  the  attributed  argument  of  the  word  and  by 
the  text  serial  number  (modulo  1000)  of  the  word  that  was  the  source  of  the 
prediction  that  resulted  in  the  attributed  argument.  In  columns  6  and  7  of 
the  output  file,  the  alternative  alignments  are  replaced  by  the  preferred 
argument.  On  the  hindsight  file,  each  intersecting  alternative  argument 


that  has  not  been  salooted  as  the  preferred  argument  is  represented  by 
a  lino,  and  the  alternative  argument  itself  is  placed  in  columns  6 
and  7.  Two  extra  columns  exist  on  the  main  output  file  vMoh  are  re¬ 
ferred  to  as  columns  3A  and  3B.  Column  3A  contains  the  chain  number 
after  the  analysis  of  the  word  represented  by  the  10-word  item. 

Column  3B  contains  the  number  of  predictions  in  the  prediction  pool  be¬ 
fore  the  analysis  of  the  current  word.  Moreover,  whenever  a  prediction 
which  should  have  been  fulfilled  is  wiped  from  the  pool,  it  is  marked  on 
the  hindsight  file  (Fig.  5-3d). 

It  should  be  stressed  once  more  that  the  single  English  corre¬ 
spondent  of  the  Russian  word  that  is  included  in  a  texthadic  item  has 
little  significance  in  the  translation  of  the  examples  given  in  this 
chapter.  The  purpose  of  its  appearance  is  to  aid  the  reader  who  understands 
no  Russian. 

The  machine  program  that  has  been  written  by  Bossert  consists  of 
two  sets  of  subroutines  in  addition  to  a  skeletal  section.  The  actual 
analysis  is  carried  out  by  the  subroutines  while  the  skeletal  section 
performs  the  necessary  bookkeeping  tasks.  The  skeleton  provides  the 
mechanism  for  stepping  through  both  the  predictions  in  the  pool  and  the 
alternative  arguments,  so  that  a  single  alternative  argument  is  tested 
against  a  single  prediction  at  a  time.  It  also  provides  the  mechanism 
for  updating  the  prediction  pool. 

The  first  set  of  22  subroutines,  called  essences  (Table  5a  of 
Appendix  F)  represent  syntactic  relationships  that  are  predicted  and 
fulfilled  during  syntactic  analysis.  The  subroutines  themselves  carry 


^-30 


5-31 


out  all  the  tests  to  determine  whether  a  prediction  is  fulfilled  by  one 
(or  more)  of  a  set  of  alternative  arguments.  There  is  an  essence  sub¬ 
routine  for  every  prediction  tha'f  can  be  stored  in  the  predi^ition  pool. 

The  second  set  of  25  fmtotlon  type  subroutines  (Table  5b  of 
Appendix  F)  represent  word  categories  similar  to  the  familiar  "parts  of 
speech"  and  "syntactic  roles".  These  subroutines  make  the  new  predictions 
which  are  then  put  into  the  prediction  pool  and  also  modify  existing 
predictions  in  the  pool.  The  first  group  consists  of  15  subroutines 
that  represent  the  parts  of-  speech  and  make  new  predictions  based  on  the 
preferred  arguments  of  the  analyzed  words,  whereas  the  second  group  consists 
of  10  subrou-tlnes  that  represent  the  syntactic  roles  and  modify  existing 
predictions  in  the  pool  according  to  the  attributed  arguments  of  the 
analyzed  words. 

If  the  name  of  a  subroutine  is  likely  to  be  misleading,  a  suffix 
"-E"  for  essence  type  subroutine  and  a  suffix  "~T"  for  the  function  type 
subroutine  have  been  appended  to  the  name. 

The  subroutines  are  completely  independent,  that  is,  only  one 
subroutine  is  used  at  a  time.  Once  control  is  passed  to  the  subroutine, 
the  subroutine  retains  con-trol  until  the  testing  or  generating  process 
is  completed,  after  which  control  is  returned  to  the  skeletal  program. 

The  relationstilps  among’  the  alternative  arguments,  the  predictions, and 
the  subrou-bines  are  shown  in  the  tables  of  Appendix  F.  A  detailed  example 
of  the  use  of  the  tables  is  also  given  in  the  appendix. 

The  interrelationships  between  the  predictions  and  the  alternative 
arguments  have  been  condensed  and  summarized  so  that  they  could  be 
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presented,  in  their  entirety!  on  two  pages  (Table  5-1  and  5-2).  The 
preferred  arguments  that  oan  fulfill  the  22  essences  (or  predictions) 
listed  in  Table  5-1*  In  Table  5-2  are  listed  the  predictions  that  are 
made  or  modified  by  the  preferred  and  attributed  arguments. 

As  an  example  of  the  use  of  these  tables,  the  subject  prediction 
oan  be  fulfilled  by  a  noun,  pronoun,  adjective,  numeral^ or  verb,  alternative 
argument  (Table  5-1).  This  table  does  not  indicate  that  the  first  four 
alternative  arguments  must  be  nominative,  nor  does  it  indicate  that  the 
verb  must  be  infinitive.  For  this  detailed  information,  the  tables  in 
Appendix  F  must  be  referred  to.  If  a  noun  is  selected  as  the  subject,  the 
noun  complement  prediction  is  made  by  the  noun  preferred  argument,  the 
predicate  head  prediction  is  modified,  and  a  ccanpound  subject  prediction 
as  well' as  an  infinity 'and  end  wipe  prediction  is  'made  by  the  adjective-noun 
subject  attributed  argument  (Table  5-2). 

With  the  set  of  subroutines  that  are  in  the  experimental  program, 
the  following  nested  structures  are  recognized: 

1.  Noun  structure  -  a  string  of  adjectives  terminated  by  a 
single  noun,  where  all  the  adjectives  and  the  noun  agree 
in  case,  number^and  gender. 

2.  Noun  phrase  -  a  noun  structure  in  any  case  possibly  followed 
by  one  or  more  noun  structures  in  the  genitive  case. 

3 .  Preposltlona].  phrase  -  a  preposition  followed  by  a  noun 
phrase  where  the  initial  noun  struct'ire  is  in  a  case 
that  can  be  governed  by  the  preposition. 
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4«  Verb  phrase  (including  participial  phrase)  -  a  verb 
or  participle  in  any  mood  followed  by  one  or  more  noun 
phrases  in  cases  which  can  be  governed  by  the  verb  or 
!  participle. 

5*  Clause  -  independent  and  dependent  clauses  are  both 
treated  In  the  same  manner  in  the  present  program. 

Only  three  fundamental  elements  of  a  clause  are 
considered:  subject,  predicate,  and  object.  Usually 
there  are  several  ph_rase  structures  within  a  clause. 

The  nested  structures  in  the  sentence  often  Include  combinations 
of  clauses  and  the  several  types  of  phrases.  All  the  efforts  until  now 
have  concentrated  on  identifying  all  the  members  of  a  given  clause  or 
phrase  so  that,  at  this  time,  there  is  no  scheme  in  the  program  to 
determine  the  syntactic  relationships  among  the  phrases  and  clauses. 

The  steps  of  the  experimental  predictive  syntactic  analysis  program 
parallel  quite  closely  the  steps  in  the  algorithms  of  the  preceding  chapter. 
The  individual  steps  of  the  program  are  summarized  formally  in  Iverson's 
notation  (Program  5"1). 

The  program  is  initialized  for  each  sentence  in  steps  1  and  2. 

Th'  chain  number  is  set  to  zero  and  an  initial  set  of  predictions  is 
stored  in  the  prediction  pool. 

The  first  word  on  the  input  file  is  read  into  the  temporary  store 
and  the  alternative  arguments  of  the  word  are  listed  in  r  (Steps  3  and  4) « 

A  matrlic  9,  in  which  will  be  recorded  all  the  possible  pairs  of  preferred 
argument  and  attributed  argument,  is  cleared  in  step  5. 


*0 
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The  index  i  is  initialized  in  step  6  and  deoremented  in  step  7  to 

allow  for  the  process  of  step  8,  where  all  the  alternative  arguments  are 

tested  against  each  prediction  in  the  order  of  the  listing  of  the  predictions 

in  the  pool.  This  testing  for  intersections  results  in  a  logical  vector 

with  length  equal  to  L(r),  each  component  of  the  vector  equal  to  ”1"  if  the 

corresponding  alternative  argument  of  r  can  satisfy  the  prediction  The 

vector  r  Is  then  reduced  by  this  logical  vector,  and  the  corresponding 

potential  preferred  arguments  are  stored  in  For  each  potential  preferred 

2 

argument,  the  appropriate  potential  attributed  argument  is  stored  in  9 
.  (Step  9) • 

When  this  process  has  been  carried  out  for  each  prediction  in  the 
pool,  the  program  checks  whether  any  preferred  arguments  have  been  discovered 
(Step  lO).  If  not,  the  chain  number  is  incremented  in  step  11.  to  indicate 
that  there  has  been  an  error  in  the  analysis,  all  the  alternative  arguments 
are  transferred  to  9^  (Step  12),  and  the  arbitrary  choice  attributed 

p 

argument  is  placed  into  the  corresponding  positions  of  £  (Step  13)^  so 
that  the  program  can  arbitrarily  choose  a  preferred  argument. 

In  step  14,  the  first  alternative  argument  of  9,  which  is  the  first 
alternative  argument  intersecting  with  a  prediction  in  the  pool,  is  taken 
as  the  preferred  ai-gument.  If  no  prediction  has  been  fulfilled,  the  fi.rst 
alternative  argument  on  the  list  9  is  recorded  as  the  preferred  argiment 
on<j5^  (step  14).  In  either  case,  the  appropriate  attributed  argument  is 
also  recorded  on 

All  the  other  alternative  arguments  on  the  list  9  are  stored  in  the 
hindsight  file  (Step  15).  In  the  last  step,  new  predictions  are  Inserted 
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at  the  top  of  the  prediotion  pool  based  on  the  preferred  argument  and  the 
attributed  ai’gument  of  the  analyzed  word.  The  old  prediotion  pool  is 
appended  to  the  new  pool  from  below  after  suitable  modifications, 
including  the  wiping  of  predictions  due  to  -the  activation  of  end  wipe, 
have  been  made  to  the  predictions  in  the  old  pool. 

The  process  returns  to  step  3  and  the  next  word  is  read  into  \f/ . 

5.  The  Prepositional  Phrase 

The  occurrence  of  certain  words  in  a  sentence  such  as  adverbs, 
commas,  and  some  prepositions  cannot  be  predicted.  They  occur  without 
any  previous  signal  and  therefore  it  is  necessary  to  provide  a  special 
scheme  to  analyze  such  words.  In  the  experimental  program,  the  infinity 
prediction  is  the  mechanism  that  permits  the  identification  of  such 
words  independent  of  preceding  words  in  a  sentence.  Since  there  is  an 
infinity  prediction  in  the  pool  at  all  times,  these  words  are  always 
predicted. 

If  a  word  is  predicted  by  the  infinity  prediction,  the  syntactic 
structure  of  the  sentence  is  incomplete.  All  that  is  known  about  the 
word  is  the  nest  within  which  it  belongs,  since  each  infinity  prediction 
is  located  in  a  set  of  predictions  in  the  pool  representing  a  nested 
structure  of  the  sentence  imder  analysis.  Only  after  the  entire  nest 
has  been  analyzed  can  the  word  predicted  by  infinity  be  tied  syntactically 
to  the  rest  of  the  nested  structure.  The  infinity  prediction  always 
inhibits  the  action  of  the  end  wipe  and  arbitrary  choice  predictions, 
since  it  is  located  above  the  other  two  predictions  in  the  pool. 
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Three  examples  will  be  used  as  illustrations  of  the  analysis  of 
prepositional  phrases.  The  texthadic  input,  the  main  output  file  oontainlng 
the  preferred  and  attributed  arguments,  and  the  hindsight  file  (if  any)  will 
be  shown  with  each  example. 

A  straightforward  analysis  is  illustrated  by  the  phrase  Ha  aHD;^Hoft 
HarpysKe  jjauriH  (Fig.  5-4)  •  The  rules  for  the  analysis  are  given  in 
Appendix  F.  The  single  alternative  argument,  /preposition/,  of  na  fulfills 
an  infinity  prediction,  at  least  one  of  which  is  always  in  the  prediction 
pool.  Since  no  other  intersection  is  possible,  nothing  is  written  on  the 
hindsight  file. 

A  preposition  complement  prediction  is  made  for  every  case  and  number 
combination  that  the  preposition  can  govern.  The  four  combinations  that  aa 
can  generate  are  indicated  in  column  6  of  the  texthadic  item.  The  priority 
list  for  the  ordering  of  the  preposition  complement  predictions  is  given  by 
the  first  three  characters  of  column  8.  In  this  instance,  the  prepositional 
(locative)  predictions  are  listed  prior  to  the  accusative  predictions.  The 
singular  prediction  is  always  predicted  prior  to  the  plural  prediction.  The 
first  few  predictions  in  the  pool  after  the  analysis  of  na  are; 

1.  Preposition  complement  (locative  singular) 

2.  Preposition  complement  (locative  plural) 

3.  Preposition  complement  (accusative  singular) 

4.  Preposition  complement  (accusative  plural) 

5.  etc.  (old  predictions) 

Two  predictions  for  each  case  are  made  for  historic  reasons  only. 

It  was  convenient  originally  to  make  separate  predictions  for  each  case 


and  number  combination.  To  reduce  the  nrcaber  of  predlotione  in  future 
versions  of  the  experimental  program,  only  two  predlotions  should  be  made, 
one  for  each  case.  Then  each  prediction  would  accept  either  a  singular  or 
a  plural  prepositional  ccoiplement. 

The  four  alternative  arguments  of  aHO^Hofi  are  brought  into  a  central 
memoiy  location  and  are  tested  against  the  predictions  for  intersections. 
There  is  only  one  intersection  between  the  first  four  predictions  and  the 
alternative  arguments,  resulting  in  the  preferred  argument  /adjective, 
locative,  singular,  feminine/  and  the  attributed  argument  preposition 
complement.  There  are  no  other  intersections  with  the  previous  predictions 
in  the  pool  (from  the  earlier  words  in  this  sentence),  so  nothing  is  recorded 
on  the  hindsight  file. 

Since  the  PSI  of  the  preposition  complement  prediction  is  00,  the 
three  predictions  which  have  not  been  fulfilled  are  wiped  from  the  pool. 

Four  new  predictions  are  inserted  at  the  top  of  the  new  pool,  the  master 
prediction  by  the  adjectival  preferred  argument  and  the  other  three  by  the 
preposition  complement  attributed  argument^  in  the  following  order? 

(1)  Master  (of  preposition  coraplement)  (locative, 

"■  singular,  feminine) 

(2)  Compound  preposition  complement  (locative) 

(3)  Infinity 
(li)  End  Wipe 

(5)  etc.  (old  predictions) 


The  two  alteniative  arguments  of  narpyoKe  are  brought  into  the 
central  memory  location.  Once  more  there  is  only  a  single  intersection 
between  the  alternative  arguments  and  the  predictions  in  the  pool,  and 
HarpysKe  is  assigned  the  preferred  argument  /noun,  locative,  singular, 
feminine/  and  the  attributed  argument  master.  Nothing  is  recorded  on  the 
hindsight  file. 

Since  the  master  attributed  argument  makes  no  predictions,  only 
the  prediction  of  noun  complement  is  made  by  the  preferred  argument.  This 
prediction  replaces  the  fulfilled  master  prediction  at  the  top  of  the  pool 
as  follows; 

(1)  Noun  complement 

(2)  Compound  preposition  complement  (locative) 

(3)  Infinity 

(ii)  End  Wipe 

(5)  etc.  (old  predictions) 

When  the  three  alternative  arguments  of  jiaMntT  are  tested  against 
the  predictions,  the  only  intersection  results  in  the  preferred  argument 
/noun,  genitive,  singular,  feminine/  and  the  attributed  argument  noun 
complement.  Once  more  nothing  is  written  on  the  hindsight  file. 

Several  interesting  points  of  this  analysis  are  worth  noting; 

(1)  All  the  predictions  for  the  analysis  of  the  prepositional 
phrase  were  located  above  the  predictions  that  were  in  the  pool  just  before 
the  phrase  occurred.  In  fact,  there  is  an  end  wipe  prediction  located 


between  the  old  predictions  in  the  pool  and  the  remaining  new  predictions. 


The  analysis  of  the  phrase  has  been  carried  out  entirely  independently  of 
any  previous  analysis  of  the  sentence. 

(2)  All  ambiguities  in  the  adjective  and  the  two  nouns  have  been 
completely  resolved,  and  a  unique  case  and  number  has  been  assigned  to 
each  word, 

(3)  The  analysis  of  the  prepositional  phrase  was  completed  with 
no  arbitrary  choices,  and  no  alternatives  were  recorded  on  the  hindsi^t 
file.  This  would  indicate  that  the  analjrsis  was  carried  out  correctly  and 
no  other  analysis  could  have  been  possible, 

(li)  The  prepositional  phrase  consists  of  the  preposition  aa  and 
the  two  noun  structures  aHO^Hofi  narpysKe  and  jraMiui  which  together  make  up 
a  noun  phrase. 

In  contrast  to  the  simple  analysis  of  this  phrase,  consider  the 
phrase  b  nocrneflyjsmoc  KacKaflax(Fig.  5-5),  The  preposition  b,  which  fulfills 
the  infinity  prediction,  makes  four  preposition  complement  predictions  as 
did  Ha  in  the  previous  example,  except  that  they  are  listed  in  the  opposite 
order,  accusative  first  and  locative  second,  since  the  priority  order  in 
column  8  is  different. 

There  are  several  intersections  between  the  alternative  arguments 
of  nocxeTO^raHMx  and  the  predictions  in  the  pool.  The  first  intersection  is 
between  the  alternative  argument,  /adjective,  accusative,  plural/,  and  the 
accusative  plural  preposition  complement  prediction.  The  second  intersection 
is  between  the  alternative  argument,  /adjective,  locative,  plural/,  and  the 
locative  plural  preposition  complement  prediction.  As  usual,  the  alternative 


(D 

n 

I 


t 

o 

•H 


Dl 

O 

ft 


ID 

U 

ft 


lA 

ik 


ft 


o! 


5~li6 

arguraent  of  the  first  intersection  is  assigned  as  the  preferred  argument, 
and  the  other  intersections  are  recorded  on  the  hindsight  file. 

The  analysis  of  the  preceding  words  in  this  sentence  generated  two 
other,  older,  predictions,  which  are  fulfilled  by  the  alternative  arguments 
of  noojieflyion^MX.  These  two  intersections  are  noted  in  hindsight;  but,  since 
they  actually  have  nothing  to  do  with  the  analysis  of  the  phrase,  they  will 
be  neglected  here. 

The  prediction  pool  is  updated,  and  at  the  top  of  the  pool  is 
entered  a  new  set  of  four  predictions! 

(1) .  Master  (of  preposition  complement)  (accusative  plural) 

(2)  Compound  preposition  con^lement  (accusative) 

(3)  Infinity 

(ii)  End  Wipe 

(5)  etc.  (old  predictions) 

There  are  no  intersections  whatsoever  between  the  single  alterna¬ 
tive  argument  of  KacKa;a;ax,  /no\in,  locative,' plural,  masculine/,  and  the 
predictions  in  the  pool.  When  no  intersections  are  found  during  the  test¬ 
ing  of  the  first  three  predictions,  the  end  wipe  prediction  is  activated, 
and  all  four  predictions  are  marked  for  wiping  from  the  pobl.  Since  the 
master  prediction  has  a  PSI  of  01,  its  wiping  is  listed  on  the  hindsight 
file.  Because  no  intersections  are  found  when  the  rest  of  the  predictions 
in  the  pool  are  tested,  the  alternative  argument  is  taken  as  the  preferred 
argument  by  the  arbitrary  choice  prediction,  and  the  arbitrary  choice  at¬ 
tributed  argument  is  assigned.  The  chain  number  is  then  incremented  from 
05  to  06 « 
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As  is  obvious  even  to  the  casual  reader  of  Russian,  the  wrong  case, 
the  accusative  instead  of  the  locative,  was  selected  for  nocjieflyrouacc .  On 
a  following  paaa,  this  information  would  be  sufficient  to  select  the  locative 
alternative  argument  as  the  preferred  argument. 

All  the  errors  in  prepositional  phrases  that  have  been  made  so 
far  by  the  program  occur  with  the  preposition  b,  as  in  the  preceding  example. 
This  suggests  that  there  is  an  error  in  the  ordering  of  the  preposition 
complement  predictions  in  the  pool  fore, since  the  correct  prediction  is 
always  located  below  the  one  selected  as  the  attributed  argument.  The 
priority  order  of  the  cases  governed  by  b  should  be  inverted,  so  that  the 
locative  case  la  tested  before  the  accusative  case.  This  can  be  done  by 
modifying  the  information  in  the  first  two  character  positions  of  column  8 
for  the  preposition. 

Another  example  of  more  interesting  nesting  is  offered  by  the  string 
MSMepuTb  cpeflHioK)  aa  mhopo  nepnoflos  awroiiiTyfly  (Fig.  5-6) .  The  single  alterna¬ 
tive  argument  of  cpeflHiDio  intersects  with  three  predictions  in  the  pool.  It 
fulfills  the  prediction  of  object  of  the  verb  infinitive  MswepuTb .  The 
other  two  intersections  with  earlier  object  predictions  are  recorded  on  the 
hindsight  file,  A  master  prediction  is  entered  at  the  top  of  -o  pool. 

The  following  preposition  aa  fulfills  an  infinity  prediction  and 
sets  up  four  preposition  complement  predictions  above  the  master  piediction 
as  follows: 


PREFERRED  ARGUMENT  AND  ATTRtBUTED  ARGUMENT 
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(1)  Preposition  complement  (instminental  singular) 

(2)  Preposition  complement  (instrumental  plural) 

(3)  Preposition  complement  (accusative  singular) 

(4)  Preposition  complement  (accusative  plural) 

(5)  Haater  (of  object)  (accusative  singular  feminine) 

(6)  etc.  (old  predictions) 

The  following  numeral  uHoro  has  eight  alternative  arguments: 

(1)  /adjectival,  nominative,  singular/ 

(2)  /nominal,  nominative,  singular/ 

(3)  /adjectival,  accusative,  singular/ 

(4)  /nominal,  accusative,  singular/ 

(5)  /adjectival,  nominative,  plural/ 

(6)  /nominal,  nominative,  plural/ 

(7)  /adjectival,  accusative,  plural/ 

is)  /nominal,  accusative,  plural/ 

There  are  fourteen  intersections  among  the  alternative  arguments 
and  the  predictions  in  the  pool.  The  first  intersection  is  between  the 
third  prediction  in  the  pool  and  the  third  alternative  argument,  resulting 
in  the  preferred  argument  and  attributed  argument  listed  on  the  main  output 
file.  The  fourth,  seventh,  and  eighth  alternative  arguments  also  intersect 
with  the  third  prediction,  and  there  are  intersections  between  the  fifth 
prediction  and  the  third  and  fourth  alternative  arguments.  These,  and  the 
remaining  eight  intersections  are  listed  on  the  hindsight  file  in  the  order 
in  which  they  are  identified. 
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Numerals,  when  used  adjectivally,  make  special  master  predictions 
dependent  on  the  information  contained  in  column  8.*  wHoro  predicts  a  master 
in  the  genitive  case,  and  either  singular  or  plural.  Since  the  remaining 
unfulfilled  preposition  complement  predictions  are  wiped  from  the  pool 
(PSI  is  equal  to  OO),  the  top  of  the  new  pool  after  the  analysis  of  mhopo 
is  as  follows: 

(1)  Master  (of  preposition  complement)  (genitive) 

(2)  Compound  preposition  complement  (accusative) 

(3)  Infinity 

(4)  End  Wipe 

(5)  Master  (of  object)  (accusative,  singular,  feminine) 

(6)  etc.  (old  predictions) 

The  single  alternative  argument  of  nepnoflOB  intersects  with  the  first 
master  prediction, resulting  in  the  attributed  argument  master  of  preposition 
complement.  The  noun  preferred  argument  malces  a  new  prediction  of  a  noun 
complement,  replacing  the  fulfilled  master  prediction  at  the  top  of  the  pool. 

Next,  the  single  alternative  argument  of  aMmniTyfly  is  brought  into 
the  central  memory  location  and  tested  against  the  predictions  in  the  pool. 
None  of  the  first  four  predictions  are  fulfilled, so  that  the  end  wipe  pre¬ 
diction  is  activated.  This  is  a  signal  that  the  analysis  of  the  prepositional 
phrase  has  been  completed  and  the  predictions  of  another  nest  are  about  to 
be  tested.  There  is  an  intersection  with  the  following  master  of  object 
prediction  which  is  noted  on  the  main  output  file.  Two  other  later  inter¬ 
sections  are  then  also  noted.  The  final  analysis  shows  the  prepositional 
phrase  aa  mhopo  nepnoflOB  nested  within  the  noun  phrase  cpeflHKio  aMiuiMTyfly  ■ 


5-51 


Although  no  mistaJces  have  been  shown  In  this  section,  it  is  possible 
that  they  can  occur,  particularly  so  when  there  is  a  legitimate  ambiguity 
in  the  syntax  that  cannot  be  solved  by  syntactic  ansilysis  alone*  Such  a 
situation  will  be  shown  in  the  next  section. 

6.  The  Identification  of  the  Subject,  Predicate,  and  Object  in  a  Clause 

The  recognition  of  the  subject,  predicate,  and  object  in  a  clause 
is  closely  akin  to  the  recognition  of  the  necessary  elements  within  any  of 
the  phrase  structures.  What  make  the  subject,  predicate,  and  object  unique 
are  the  grammatical  relationships  among  them  which  permit  the  subject, 
predicate  head,  and  object  predictions  to  be  modified  whenever  one  of  them 
is  fulfilled.  In  the  existing  experimental  program,  this  is  the  only  set 
of  predictions  that  behaves  in  such  a  manner. 

Whereas  the  subdivision  of  clauses  into  two  divisions  such  as 

5 

Chomsky's  noun  phrase  and  verb  phi'ase  is  the  more  common,  in  the  present 
scheme  of  predictive  syntactic  analysis  for  Russian  it  is  convenient  to 
divide  the  clause  into  three  divisions.  This  division  adds  facility  to  the 
manipul.ation  and  modification  of  the  subroutines. 

Actually  four  rather  than  three  predictions  are  utilized  to  carry 
out  the  analysis  of  a  clause^,  since  both  a  left  object  prediction  and  an 
object  prediction  are  used.  The  left  object,  which  can  be  fulfilled  by  an 
accusative  or  instrumental  adjective,  noun,  pronoun,  or  numeral,  is  predicted 
with  the  subject  and,  predicate  head  predictions  and  must  be  fulfilled  before 
the  predicate  has’  been  identified,  that  is,  it  is  located  to  the  left  of 
the  predicate;  otherwise,  it  is  wiped  from  the  prediction  pool  when  the 
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predicate  is  identified  and  an  object  prediction  is  made  based  upon  the  verb 
government  coding  in  the  predicate  head.  Once  more,  it  is  simply  a  question 
of  convenience  in  coding  and  also  in  the  arrangement  of  the  program  output. 

In  a  majority  of  cases,  the  subject,  predicate,  and  object  occur  in 
the  order  mentioned;  however,  it  is  not  uncommon  to  find  a  sentence  where 
the  positions  of  the  subject  and  object  are  reversed.  The  reversed  con¬ 
struction  occurs  too  frequently  for  the  analysis  not  to  have  a  mechanism  to 
recognize  it.  The  left  object  prediction  has  been  created  to  fulfill  the 
need  for  interpreting  on  the  first  pass  the  sentence  in  which  the  object 
precedes  the  predicate. 

There  is  no  obvious  disadvantage  to  this  scheme  of  operation.  Errors 
and  mistakes  due  to  this  approach  do  occur^  and  an  example  of  each  will  be 
considered  later  in  this  section.  However,  since  all  the  alternative  schemes 
that  were  considered  seem  to  allow  at  least  as  many  errors  and  mistakes,  this 
approach  does  not  seem  disadvantageous. 

Initially,  predictions  of  subject,  left  object,  and  predicate  head 
are  entered  into  the  pool  in  that  order.  The  predicate  head  prediction  is 
modified  if‘  either  the  subject  or  the  left  object  predictions  are  fulfilled 
first.  Likewise,  the  predicate  head  prediction  modifies  the  subject  pre¬ 
diction  when  the  predicate  head  is  fulfilled  first.  The  modifications  serve 
to  limit  the  number  of  alternative  arguments  that  can  intersect  with  the 
modified  predictions.  This  is  particularly  important  because  of  the  frequency 
of  occurrence  of  nouns  and  adjectives  with  at  least  two  alternative  arguments, 
one  nominative  and  the  other  either  accusative  or  instrumental.  Frink  and 
Kline^  have  compiled  some  statistics  on  the  frequency  of  the  textual  occuiTence 
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of  the  various  alternative  arguments.  Some  of  the  figures  based  on  a  sample 
of  9,618  nouns  and  adjectives  found  in  texts  are  given  in  Table  5-4-  It  is 
seen  that  more  than  one  third  of  all  nouns  and  adjectives  have  nominative- 
accusative  or  nominative-instrumental  alternative  argument  pairs  which  can 
fulfill  both  subject  and  object  (or  left  object)  predictions.  Without  the 
modifications  of  predictions  for  agreement  in  case  and  number,  errors  in 
analysis  would  occur  more  often,  and  more  passes  would  be  needed  to  achieve 
a  correc?  analysis . 


Nouns 

Adjectives 

Nouns  and 
Adjectives 

Words  with  alternative 
arguments  that  can 
fulfill  both  subject 
and  object  nredictions. 

2,706 

44«0  % 

878 

25.3% 

1 

J 

3,584 

37.3% 

Words  with  alternative 
arguments  that  can 
fulfill  either  subject 
or  object  predictions. 

938 

15.2% 

2,429 

70.1% 

3,367 

35.0% 

Words  with  alternative 
arguments  that  can 
fulfill  neither  subject 

2,509 

40.8% 

158 

4.6% 

2,667 

27.7  % 

nor  object  predictions. 

6,153 

3,465 

9,618 

Frequency  with  which  Text  Occurrences  of  Nouns  and  Adjectives 
Can  Fulfill  Subject  and  Object  Predictions 


TABLE  5-4 
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To  illustrate  the  effect  of  prediction  modification,  several  examples 
will  be  usedjand  predictions  which  do  not  affect  the  modifications  of 
interest  will  be  deliberately  overlooked. 

The  most  common  sequence  is  subject-predlcate-objeot,  which  is 
represented  by  the  sentence  segment  a^ecb  mu  onpe^^ejniM  snmemn. . .  (Fig.  5-7). 
The  subject,  left  object,  and  predicate  head  predictions  are  in  the  pool  in 
the  given  order.  Placing  the  subject  above  the  left  object  permits  a  word 
whose  alternative  arguments  intersect  with  both  predictions  to  be  selected 
as  the  subject. 

The  first  word,  the  adverb  a^ecb,  is  accepted  by  the  infinity  pre¬ 
diction.  Since  an  adverb  makes  no  new  predictions  the  pool  remalm  unmodified. 
The  pronoun  mu  has  only  one  alternative  argument,  /pronoun,  nominal,  first 
person,  nominative,  plural,  masculine  or  feminine/  which  Intersects  only 
with  the  subject  prediction.  Since  there  are  no  other  intersections,  nothing 
is  recorded  on  the  hindsight  file.  In  updating  the  prediction  pool,  the 
predicate  head  prediction  is  modified  so  that  only  a  first  person,  plural, 
and  masculine  or  feminine  predicate  can  fulfill  the  prediction.  The  following 
verb  onpe^ejniM  satisfies  these  imposed  conditions  so  that  it  can  be  selected 
as  the  predicate  head.  The  second  alternative  argument  of  onpe^ejiMM, 

/short  form  adjective,  singular,  masculine/,  cannot  satisfy  the  conditions 
Imposed  on  the  predicate  head  prediction  since  the  alternative  argument  is 
singular  and  the  modification  is  for  plural  only. 

Since  the  predicate  head  has  been  fulfilled  before  the  left  object, 
the  latter  prediction  is  wiped  from  the  pool  and.  a  prediction  for  an 
accusative  object,  based  on  the  ®P7®  govemo^nt  code  in  column  5  of  onpe;i;emfiM, 
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is  entered  into  the  new  pool.  Although  snaueroiH  has  three  alternative 
ai'guments,  there  is  only  one  intersection  and  the  noun  is  selected  as 
the  object  of  onpe^ejMM. 

An  exajaple  of  an  adjective  predicate  head  preceding  a  subject  is 
given  in  the  next  illustration,  npe^jiDatena  MeTOflMKa...(Fig.  5-8).  The 
single  alternative  argument  of  npe^JioxeHa  intersects  with  the  predicate 
head  prediction.  The  subject  prediction  is  then  modified  so  that  only  a 
singular  and  feminine  subject  can  be  accepted.  Since  Meroffma  fulfills 
these  limitations,  it  is  accepted  as  the  subject  of  the  clause. 

As  an  example  of  how  the  modification  of  predictions  catches  errors, 
consider  the  string  npn  nofliunweintM  cjie;nyTCimero  HaKamuiBaiamero  KOHaencaTopa 
Bce  sBjieHMfl  noBTopaiOTCs. .  .(Fig.  5-9).  The  subject,  left  object,  and 
predicate  head  predictions  are  in  the  pool  together  with  an  infinity  pre¬ 
diction.  The  preposition  npM  is  accepted  by  the  infinity  prediction  which 
leads  to  the  identification  of  the  prepositional  phrase  npM  noanjocMeHHH 
cxeaymmero  HaKaroiMBaionero  Ko^aencaropa  (Sec.  5).  After  the  analysis  of 
KOHaencaTopa j  a  prediction  for  a  noon  complement  is  placed  above  the  other 
predictions  of  the  clause. 

The  pronoun  see  has  eight  alternative  arguments: 

(1)  /pronoun,  adjectival,  nominative,  singular,  neuter/ 

(2)  /pronoun,  adjectival,  accusative,  singular,  neuter/ 

(3)  /pronoun,  adjectival,  nominative,  plural/ 

(4)  /proamm,  adjectival,  accusative,  pluraV 

(5)  /pronoun,  nondaal,  nominative,  singular,  neuter/ 

(6)  /pronoun,  nominal,  accusative,  singular,  neuter/ 


PREFERggP  ARSUMEWT  AND  ATTWiaUTED  ARG 


O 
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o 
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(7)  /pronoun,  nominal,  noalnatlvo,  plural/ 

(8)  /pronoun,  nc»inal,  accusative,  plural/ 

Four  of  these  alternative  argusents  interseet  with  the  subject  prediction, 
and  the  other  four  intersect  with  the  left  object  prediction.  Although 
Bce  is  correctly  identified  os  the  subject  of  the  clause,  the  wrong  preferred 
argiUBont  ic  selected,  /pronoun,  adjectival,  3rd  person,  noainativo,  singular, 
neuter/.  All  of  the  aoven  other  intersections  are  recorded  on  the  hindsight 
file.  The  subject  attributed  arguwent  modifies  the  predicate  head  prediction, 
so  that  only  a  singular  and  neuter  predicate  can  fulfill  the  prediction. 

The  adjectival  proforrsd  argument  sots  up  a  master  prediction, which  must  bo 
fulfilled  (PSl  01)j  f oUerwed  by  an  end  wipe.  A  noun  or  adjective  fulfilling 
the  master  prediction  must  bo  ruminative,  singular,  and  neuter. 

When  the  three  alternative  arguments  of  saneHun  are  brought  into 
tho  central  memory  location,  none  of  them  intersects  with  the  leading  master 
prediction.  Tho  end  wipe  is  activated,  wiping  the  master  prediction  and 
noting  it  on  tho  hindsight  file.  The  /noun,  accusative,  plural,  neuter/ 
alternativo  argument  intersects  with  the  left  object  prediction.  This 
further  modifies  the  predicate  head  prediction  so  that  only  a  transitive 
verb  can  be  accepted. 

The  verb  noBTopOTres  cannot  fulfill  the  predicate  head  prediction , 
since  it  is  plural  and  reflexive.  It  cannot  fulfill  any  other  prediction 
either,  so  that  it  is  selected  as  an  arbitrary  choice  after  the  predicate 
head  prediction  is  wiped  and  recorded  on  the  hindsi^t  file .  The  chain 
number  is  incremented  to  indicate  the  error.  In  a  later  pass,  if  the  subject 
prediction  were  initially  limited  to  plural  subjects,  the  analysis  would 
proc'eiljd  correctly. 
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The  8ent0noeiITpe;^MeTOM  HaoTom(ero  coodmeiinfl  ABjifieTca  OHeuiwa 
BosMOXHocrei}. (Fig.  5-10 Is  an  example  of  the  sequence  objeot-predioate- 
subjeot,  Tihich  is  quite  common  when  there  is  a  reflexive  verb  acting  as  the 
predicate.  The  subject,  left  object,  and  predicate  head  predictions  are  at 
the  top  of  the  pool  when  the  alternative  argument  of  npe;[iMeTOu,  /noun, 
instrumental,  singular,  masculine/,  is  tested.  The  single  intersection  with 
the  left  object  prediction  results  in  the  selection  of  npsflueroM  as  the 
object  of  the  clause.  The  predicate  head  is  modified  so  that  only  a  predicate 
governing  the  instrumental  case  can  be  accepted.  The  noun  phrase  nacTOBmero 
coo6meHiin  is  selected  as  the  noun  complement  ofnpeflMexoM  (see  Sec.  5),  after 
which  the  alternative  argument  of  HEjinercs  is  tested.  Once  more  there  is  a 
single  intersection,  this  time  with  the  modified  predicate  head  prediction. 

The  program  can  determine  that  the  verb  governs  the  instrumental  case  by 
testing  whether  the  verb  is  reflexive.  Having  selected  a  predicate  before 
finding  a  subject,  the  subject  pi^diction  is  modified  so  that  only  a  third 
person  singular  subject  can  be  accepted.  Although  there  are  two  alternative 
arguments  of  aHajine,  there  is  only  one  intersection, and  ananMs  is  chosen  as 
the  subject  of  the  clause. 

/iri"'ther  sentence,  b  ^Ty  eMKocxt.  homumo  pacnpeflejieHHoii  eMKocTw 
iiDHTaxa  BxoflOT  MeHflyejieKTpoflHue  eMKOcm  ,Bcex  noflKjacmaioiUPix  jiawn.  (Fig.  5-11), 
demonstrates  that  the  ordering  of  the  predictions  can  occasionally  cause  an 
error.  The  sentence  starts  with  two  prepositional  phrases  b  exy  eiJKOcTb  and 
noMKMO  pacnpeflejieHHoii  eMKocni  iKurraJEa.  The  next  word,  the  verb  Bxoflfrr, 
fulfills  the  predicate  head  prediction,  on  the  one  hand,  modifying  the  subject 
prediction  so  that  only  a  third  person  plural  subject  will  be  accepted  and, 
on  the  other  hand,  after  wiping  the  left  object  prediction,  introducing  an 


-<W  iir 


I  !^-6l 


A  Segment  of  a  Clause 
Fig,  5-10 
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accuBative  object  prediction  in  the  pool  above  the  gubjeot  prediction.  The 
alternative  arguments  of  the  following  adjective,  wejrorajieKTpo^i^HHe,  intersect 
with  both  the  object  and  subject  predictions.  The  attributed  argument  is 
object  since  the  first  intersection  is  with  the  object  prediction.  Emkoctm 
fulfills  the  master  prediction  generated  by  the  preferred  argument  of 
M9s?cy8JieKTpo;^HMe,  and  ncex  noj^Kjocwanurac  maun  is  a  noun  complement  noun 
structure  predicted  by  euKocTM.  When  the  period,  the  punctuation  mark 
indicating  the  end  of  the  sentence,  is  identified,  all  remaining  predictions 
that  should  have  been  fulfilled  are  recorded  on  the  hindsight  file.  In  this 
case,  the  only  such  prediction  is  the  unfulfilled  subject  prediction.  This 
would  allow  a  future  pass  to  Identify  ueiTO'’3JieKTpo;;Hi>ie  correctly  as  the 
subject  of  the  clause. 

This  particular  error  can  be  attributed  to  the  relatively  unsophisti¬ 
cated  way  of  handling  object  predictions.  In  the  experimental  program  verbs 
are  assumed  to  govern  accusative  objects^  unless  either  the  verb  is  in  the 
reflexive  voice, or  there  is  a  government  code  in  column  5  for  another  case. 

A  future  program  should  be  capable  of  interpreting  phrase  government  in 
addition  to  object  government. 

A  sentence  which  cannot  bo  analyzed  uniquely  by  syntactic  analysis 
alone  is  illustrated  byjB  xoKaipioHHoii  rexHUKo  boxunoe  pacnpocTpaneHMe 
nojiyuMJK)  npMMeHeHHe  ...(Fig,  5-12).  The  no^m  phrase  dojiimiDe  pacnpocTpaueHMe 
and  the  noun  npMMeHeHM®  both  have  the  same  two  alternative  arguments, 
/nominative,  singular,  neuter/  and  /accusative,  singular,  neuter/.  On  a 
first  pass  the  noun  phrase  preceding  the  verb  will  be  selected  as  the  subject, 
while  the  noun  phrase  following  the  verb  will  be  selected  as  the  object, and 
an  indication  will  b©  mad©  on  the  hindsight  file  that  the  first  noun  phrase 
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might  have  been  the  objeot.  It  is  obvious  that  if  the  sentence  hud  read 
, . .npiujieHeHHe  fiojibinoe  paonpooTpaneioie, ,  the  analysed  output  would 

be  a  syntactic  analysis  on  the  first  pass  which  a  reader  of  Russian  would 
immediately  i-ejeot  on  semantic  grounds,  but  which  the  experimental  program 
would  accept  as  a  correct  syntactic  analysis.  This  is  an  example  of  a 
mistake  in  the  program  as  opposed  to  an  error. 

7 .  Comma 

Nested  structures  can  be  written  in  several  notations  in  artificial 

7 

languages.  Oattlnger  has  demonstrated  four  of  these  notations:  left- 
parenthetic,  right-parenthetic,  full  parenthetic,  and  parenthesis-free. 
Similar  notations  are  used  in  the  Russian  language.  A  structure  similar  to 
the  left-parenthetic  notation  is  the  prepositional  phrase,  where  the  prepo¬ 
sition  serves  as  an  implicit  left  parenthesis.  Likewise,  the  initial 
adjective  of  a  noun  structure  can  be  considered  an  implicit  left  parenthesis. 

A  notation  equivalent  to  the  full-parenthetic  notation  is  also  used 
in  the  Russian  language.  The  most  trivial  example  is  the  explicit  use  of 
the  left-  and  right- parentheses  to  isolate  a  side  comment  within  a  paragraph 
or  even  within  an  Individual  sentence.  Nested  structures  such  as  participial 
phrases  and  clauses  are  isolated  from  the  rest  of  a  sentence  by  commas. 

Here,  since  only  one  symbol  is  used,  a  "left-comma"  cannot  be  distinguished 
from  a  "right- comma". 

In  the  experimental  program,  the  comma  is  recognized  only  in  its 
function  of  a  phrase  or  clause  separator.  Other  uses  of  the  coavna  such  as 
separating  woi'ds  or  phrases  'used  in  series  have  not  been  studied  yet. 
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A  comma  may  occur  at  any  point  In  a  aentence  following  the  initial 
word.  Generally,  there  is  no  signal  that  the  comma  is  about  to  occur.  It 
is  necessary,  therefore,  to  accept  one  of  these  punctuation  marks  with  the 
infinity  prediction,  and  then  to  make  a  set  of  predictions  of  aLl  the 
structures  that  the  comma  might  precede.  Pret  atly,  three  predictions  are 
used  for  this  purpose;  the  phraser  prediction  that  predicts  gerunds  and 
participles,  the  relative  conjunction  prediction  that  predicts  conjunctions 
which  introduce  subordinate  clauses,  and  the  relative  pronoun  prediction 
that  predicts  relative  pronouns  which  also  introduce  subordinate  clauses. 

A  relative  conjunction  such  as  Korga  has  no  syntactic  role  within  the  clause 
that  it  introduces,  whereas  a  relative  pronoun  such  as  KOToptrii  serves  as  a 
noun  or  adjective  within  the  clause  that  it  introduces.  Prepositional 
phrases  which  might  be  offset  by  commas  from  the  rest  of  the  sentence  are 
also  predicted,  since  there  is  an  infinity  prediction  present  in  this  set 
also.  Twelve  predictions  are  entered  at  the  top  of  the  prediction  pool 
after  the  identification  of  a  comma  as  follows: 


(1) 

Phraser 

(2) 

Infinity 

(3) 

End  Wipe 

(4) 

Relative  Conjunction 

(5) 

Infinity 

(6) 

End  Wipe 

(7) 

Relative  Pronoun 

(8) 

Subject  (Inactive) 

(9) 

Left  Object  (Inactive) 

5-67 


(10)  Predicate  Head  (Inactive) 

(11)  Infinity 

(12)  End  Wipe 

The  inactive  subject,  predicate  head,  and  left  object  predictions 
are  not  tested.  Only  when  they  are  activated,  that  is,  their  PSI  is  reduced 
by  50  (see  Appendix  F),  are  they  tested.  These  predictions  are  activated 
only  after  the  fulfillment  of  either  the  relative  conjunction  or  the 
relative  pronoun  predictions.  They  then  serve  as  the  predictions  of  the 
clause  introduced  by  the  relative  conjunction  or  relative  pronoun. 

If  only  the  use  of  a  coimna  to  isolate  nested  structures  is  considered, 
a  comma  can  serve  two  functions.  On  the  one  hand,  it  is  used  to  introduce 
a  new  nested  structure  and,  on  the  other  hand,  it  is  used  to  indicate  the 
end  of  a  nested  structure  and  the  return  to  a  preceding  nested  structure 
which  had  not  been  finished.  This  is  illustrated  schematically  in  Fig.  5-13. 
The  commas  have  been  numbered  for  identification.  Comma-1  introduces  a  new 
nested  structui’e,  the  second  clause.  Likewise,  corama-2  introduces  a  new 
nested  structure,  the  participial  phrase.  Both  camina-3  and  comma -I;  indicate 
the  end  of  a  nested  structure  and  the  return  to  a  previously  uncompleted 
stiucture,  comma-3  to  clause  2  and  comma-i;  to  clause  1. 


(1)  (2)  (3)  ft) 

- ,  -  ,  - -  ,  -  ,  - 

Clause  1  Clause  2  Participial  Clause  2  Clause  1 

Phrase 


Schematic  Representation  of  Nested  Structures  in  a  Sentence 

Fig.  5-13 
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In  the  event  a  oomma  is  being  used  to  indicate  return  to  an  uncom¬ 
pleted  nested  structure,  none  of  the  predictions,  phraser,  relative  con¬ 
junction,  or  relative  pronoun,  should  be  fulfilled.  If  an  end  wipe 
prediction  is  placed  below  the  set  of  predictions  made  by  a  comma,  all  of 
these  unfulfilled  predictions  should  be  wiped  from  the  pool.  Infinity  and 
end  wipe  predictions  are  placed  underneath  each  of  the  three  introductory 
predictions,  so  that  if  a  relative  conjunction  prediction  is  fulfilled,  the 
phraser  prediction  is  immediately  wiped  from  the  pool, and  if  a  relative 
pronoun  prediction  is  fulfilled,  both  the  phraser  and  relative  conjunction 
predictions  are  wiped.  The  ordering  of  the  phraser,  relative  conjunction, 
and  relative  pronoun  predictions  is  based  on  the  possibility  of  multiple 
intersections  between  these  predictions  and  the  alternative  arguments  of 
a  word,  and  the  desirable  initial  guess  of  the  preferred  argument. 

The  inactive  sub.iect.  left  object,  and  predicate  head  predictions 
are  put  into  the  pool  at  the  same  time  as  the  relative  conjunction  and  the 
relative  pronounjSo  that  they  may  be  at  their  proper  level  of  nesting  when 
a  subordinate  clause  is  positively ' identified .  This  will  be  made  more 
obvious  by  several  examples. 

As  an  example  of  the  identification  of  a  participial  phrase,  consider 
the  sentence:  HejMHeiiHHe  HCKa^eHna  b  ejieMeirrax  cxeMH^  ocymecTBjiHJomjix 
ycpe^HeHMe,  HenabeKHo  npMBeflyr...  (Fig.  5-14).  Hejumeijimie  HCKaseHUH  is 
Identified  as  the  subject  noun  phrase, after  which  the  prepositional  phrase 
B  omeMeHTax  cxeMM  is  identified.  The  following  comma  is  accepted  by  the 
infinity  prediction, and  the  phraser ,  relative  conjunction,  and  relative 
pronoun  predictions  are  inserted  into  the  pool  above  the  original  unfulfilled 
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A  Segment  of  a  Sentence  with  a  Participial  Phrase 

FIs.  5-11; 
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predicate  head  and  left  object  predlctlooB.  The  tuo  seta  of  predictions 
are  separated  by  an  end  wipe  prediction.  Ocy^ecrajimcn^MX  fulfills  the  phraser 
prediction  and  is  identified  as  a  participle. 

To  digress  somewhat  at  this  point,  it  should  be  pointed  out  that 
ocymeoTBjisjcnijix  is  tested  for  being  a  participle  by  the  phraser  prediction 
(the  first  intersection)  and  is  tested  for  being  an  adjective  by  the  left 
object  prediction  (the  second  intersection).  Although  the  two  tests  are 
performed  on  the  same  word,  they  are  entirely  independent,  the  phraser 
prediction  not  recognizing  that  the  word  might  be  an  adjective,  and  vice 
versa. 

After  the  identification  of  the  participle,  a  prediction  for  an 
object  of  the  participle  is  made  and  fulfilled  by  the  following  word,  the 
noun  ycpe;^HeHMe.  The  following  comma  makes  a  new  set  of  predictions  of 
phraser,  relative  conjunction,  and  relative  pronoun.  A  schematic  diagram 
of  the  prediction  pool  at  this  point  is  given  in  Fig.  5-15*  Three  nested 
structures  are  evident  at  this  time.  At  the  top  of  the  pool  are  the  pre¬ 
dictions  referring  to  a  yet  unidentified  possible  third  nested  structure. 
Belo?/  are  the  predictions  generated  during  the  analysis  of  the  already 
identified  pai’ticlpial  phrase  as  well  as  the  residue  of  unfulfilled  pre¬ 
dictions  due  to  the  first  comma.  At  the  bottom  of  the  pool  are  the  original 
predictions  from  the  main  clause  which  have  not  been  unfulfilled  yet. 

The  first  word  after*  the  second  comma,  HeHsderao,  is  identified  as 
an  adverb  by  the  infinity  prediction.  Adverbs,  like  prepositions,  often 
cannot  be  predicted  and  must  be  satisfied  by  the  Infinity  prediction.  In 
the  present  experimental  system,  the  wiping  of  the  prediction  pool  is 
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(1) 


PredlctionB  by 
2nd  Coauaa 


(2) 


End  Wipe 


Predictions  by 
Participi^  Phrase 
and  1st  cosona 


(3) 


End  Wipe 


Main  Clause 
Onfulfmed 
Predictions 


Schematic  Diagram  of  Prediction  Pool  After  Analysis 
of  Word  OOA-1263  (Fig.  5-U) 

Pig.  5-15 

inhibited  when  an  adverb  is  identified^so  that  the  prediction  pool  after 
the  analysis  of  HeHsdesiio  is  identical  to  the  pool  before  the  analysis  of 
the  adverb.  The  two  alternative  arguments  of  npuBe^yT  are  then  brought  into 
the  central  memory  location.  These  two  alternative  arguments  are  almost 
identical  on  a  syntactic  level.  The  only  difference  is  that  the  first 
alternative  argument  governs  the  dative  case,  Indicated  by  the  ”P2*',  while 
the  second  alternative  argument  governs  the  accusative  case,  indicated  by 
the  "PI"  in  column  5. 

Neither  of  the  alternative  arguments  intersects  with  any  of  the 
predictions  made  by  the  second  comma.  This  indicates  that  a  new  nested 


5-72 


structure  has  not  been  found.  Continuing  the  testing,  no  intersections  are 
found  with  the  predictions  from  the  participial  phrase  or  from  the  first 
comma.  Wiping  .this  second  set  of  predictions  leaves  only  the  predictions 
from  the  main  clause.  The  predicate  head  prediction  intersects  vrLth  both 
alternative  arguments  and,  as  usual,  the  first  is  selected  as  the  preferred 
argument.  The  intersection  shows  that  the  sentence  has  indeed  reverted  back 
to  the  main  clause. 

The  analysis  of  a  subordinate  clause  introduced  by  a  relative 
conjunction  or  a  relative  pronoun  is  not  8.s  straightforward  as  the  analysis 
of  a  participial  phrase,  chiefly  because  it  is  necessary  to  consider  the 
subject-predicate-object  structure  within  the  clause.  A  series  of  illus¬ 
trations  will  make  the  difficulties  clear. 

Consider  first  the  subordinate  clause?  ...,KOTopu^  noasojineT 
MsynaTfa  cMTHamj. ..  (Fig.  5-16).  The  relative  pronoun  KOToptiii  has  four 
alternative  arguments: 

(1)  /relative  pronoun,  adjectival,  nominative,  3rd  person, 
singular/ 

(2)  /relative  pronoun,  adjectival,  accusative,  3rd  person, 
singular/ 

(3)  /relative  pronoun,  nominal,  nominative,  3rd  person,  sj...jalar/ 

(4)  /relative  pronoun,  nominal,  accusative,  3rd  person,  singular/ 

The  first  intersection  between  an  alternative  argimient  and  one  of  the  comma 
predictions  is  between  the  relative  pronoun  prediction  and  the  first  alterna¬ 
tive  argument.  When  the  relative  pronoun  prediction  is  fulfilled,  the  testing 
process  is  temporarily  suspetded.  A  special  subroutine  scans  the  prediction 
pool,  activating  any  inactive  predictions  in  the  pool,  in  this  case  a  subject, 


A  Segment  of  a  Subord±nate  Clause 
Fig.  5-16 


left  object,  and  predicate  head  prediction.  Otherwise,  no  indication  ie 
made  that  the  prediction  has  been  fulfilled,  and  the  testing  is  resumed 
Tilth  the  next  prediction.  There  is  a  second  interseotlou  between  the  newly 
activated  subject  prediction  and  the  first  alternative  argument.  Since, 
to  the  program,  it  seems  that  this  is  the  first  intersection,  kotofhM  is 
accepted  as  the  subject  of  the  clause.  The  reason  for  inserting  the  inactive 
predictions  into  the  pool  earlier  is  now  evident.  If  the  Inactive  pre¬ 
dictions  were  not  in  the  pool,  it  would  be  impossible  to  select  KOToptrfi  as 
the  subject  of  the  clause.  The  relative  pronoun  can  be  considered  as 
fulfilling  two  independent  functions,  on  the  one  hand,  introducing  a  sub¬ 
ordinate  clause  and,  on  the  other  hand,  taking  on  an  active  role  within  the 
clause.  In  the  procedure  described,  KOTopafi  activates  the  mechanism  by 
which  it  itself  is  identified. 

There  are  seven  other  intersections  between  the  predictions  in  the 
pool  and  the  alternative  arguments  of  KOTopaS,  all  of  which  are  stored  on 
the  hindsight  file.  Since  an  adjectival  alternative  argument  of  KOTopafi 
has  been  selected  as  the  preferred  argument,  the  prediction  of  a  master  is 
inserted  at  the  top  of  the  pool  above  the  left  object  and  predicate  head, 
predictions  which  have  just  been  activated. 

Since  the  next  word  nD3B0A;?8T  is  a  verb,  the  master  prediction  is 
not  fulfilled  but  is  wiped  from  the  pool  and  recorded  on  the  hindsight  file, 
and  nosBOjiwsT  is  accepted  as  the  predicate  of  the  clause.  The  wiped  pre¬ 
diction  is  an  indication  that  on  the  next  pass  through  the  sentence  the 
nominal  alternative  argument  of  KOTopmfi  should  be  selected  as  the  preferred 
argument . 

This  technique  is  also  effective  when  the  relative  pronoun  is  in  an 
oblique  case  and  is  part  of  another  independent  nested  structure,  as  in 
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B  KOTopou  yope^^neHMe  CKTHana  ooynecTBjiBOTCB...  (Fig.  5-17).  The  prepo¬ 
sition  B  is  accepted  by  the  infinity  prediction,  and  the  phraser,  relative 
conjunction,  and  relative  pronoun  predictions  are  pushed  down  deeper  into 
the  pool  when  the  new  preposition  complement  predictions  are  entered  at  the 
top. 

Both  the  adjectival  and  the  nominal  alternative  arguments  intersect 
with  the  locative  singular  preposition  complement  predlction^and  the  former 
alternative  argument  is  selected  as  the  preferred  argument.  The  testing 
continues  and  when  the  relative  pronoun  prediction  is  fulfilled,  the  same 
activating  process  is  carried  out  as  when  KOTopuft  was  analyzed  in  the  last 
example.  Of  course,  this  time  KOTopou  cannot  be  accepted  as  the  subject. 

The  activated  predictions  are  now  in  a  position  to  make  the  selection  of 
ycpe^HCHwe  as  the  subject  of  the  clause  and  ocymeoTBjiBexcfl  as  the  predicate 
of  the  clause. 

A  relative  pronoiin  can  occur  in  another  manner  that  cannot  be 
correctly  analysed  on  a  single  pass  by  the  existing  program.  This  format 
can  be  exemplified  by  the  clause  ...,  KOK^^encaTop  KOToporo..o  (Fig.  5-18). 
When  the  alternative  arguments  of  KOB^^encaTop  are  tested,  there  is  no  way 
of  knowing  that  the  relative  pronoun  KOToporo  will  occur  Immediately  after 
KOH^eHcaTop^  and  that  KOHseHcarop  is  the  subject  of  the  clause.  Very  often, 
as  in  the  example,  the  noun  preceding  the  relative  pronoun  does  not  inter¬ 
sect  with  any  of  the  existing  predictions  and  its  preferred  argument  is 
selected  as  arbitrary  cholcsa.  The  best  that  can  be  done  in  such  a  situation, 
in  the  framework  of  predictive  syntactic  analysis,  is  to  recognize  the 
relative  pronoun  when  it  finally  occurs  and  preserve  the  necessary  information 
for  the  analysis  to  be  corrected  on  the  next  pass. 


PREFERRED  ARGUMENT  AND  ATTRIBUTED  ARGUMENT 


’6 


•  •  •  •  • 


•  o 


0  •  •  •  9  • 


CL  H  a  o  H  0.  0. 
r  o  r  uj  u  r  z 
<  o’^oeuoo 

XQ.UOUL*3UO 
Z  U  D  <D 

ooi:qci/)Z>ozz 

u  a 

zztf{Y:frrtrtnin 


O  o>  o  O 

o  t  *5 

0  0  0  4) 

o  rt  o  o 
O  iA  o  o 

O  N  O  O 
O  N  «A  O 
Ok  o  h 
O  O  -  lA 
O  lA  <0  lA 
O  O  O  O 
O  O  N  »- 


o  o  o  o  o 
o  o  o  o  o 
o  o  o  o  o 
o  o  o  o  o 
o  o  o  o  o 
o  o  o  o  o 
o  o  o  o  o 

(0  O  O  N 
O  **  CD  OO  CO 

rnyt^'C'C 

^  ^  o  ^  ^ 


(L 

N  h 

Z 

")  U  U 

o 

O  h;  bJ 

u 

O  T  ") 

to  (0 

z 

J  o  o 

N 

o  »-  p- 

9 

9  r\f\ 

9 

9  9  9 

< 

m 

A 

a 

o 

a 

A 

o 

A 

z 

CD 

o 

ffi 

O 

a 

o 

m 

< 

A 

1  1 

1  1 

1  1 

1  z 

1  1 

1  1 

1  1 

1  t 

1 

2 

1  1 

1  t 

t  1  1 

1  Z  1 

)  1 

1  1 

1  1 

1  1 
i  1 

I  ! 

1  1  Z 

1  1  1 

1  1 

1  1 

1  1 

1  1 

1  t 

t  » 

1  X  u. 

1  1  1 

®  1 

1  1 

1  » 

A  1 

\L  t 

A 

1  I  1 

1  1 

1  1 

1  4 

(0 

1  1 

IL  I 

s 

!  !  4- 

1  1 

1  1 

1  1 

^  >- 

1  I 

Z  1 

5 

1 

i  1  i 

1  1 

1  1 

1  1 

o  ^ 

1  Z 

1  i 

3 

1 

2  1  i 

1  1 

Z  1 

u.  u. 

H  < 

1  1 

z 

Z  1  z 

O 

1 

1  1  1 

1  z 

1  1 

1  ) 

z 

1  Z 

1  1 

z 

J 

I  1  1 

w  < 

< 

1  1 

1  1 

1  1 

1  t 

I  1 

o 

a  Q 

Z  1  1 

1  1  1 

1  1  } 

1  1  1 

1 

1 

1  1  1 

1  1 

1  C  1 

1  1 

z 

o 

1  1  1 

Z  1 

I  1  1 

UJ 

1 

1  t  1 

1  1 

1  a  1 

1  1 

525 

<  i  1 

o  1 

}  i  < 

> 

1 

1  i  1 

1  1 

1  <  i 

1  t 

—  1 

1  1  1 

<  1 

1  i  1 

p 

1 

1  1  1 

1  1 

1  <0  t 

1  1 

o  >• 

1  1  1 

A  1 

1  z  z 

< 

1 

1  1  1 

a.  1 

1  O  1 

1  1 

O  A 

Q.  Z  1 

i  1 

a  1  1 

z 

Z 

1  1  1 

1  t 

1  O  1 

1  ( 

O  1 

1  1  1 

1  I 

1  1 

z 

1 

1  1  ^ 

1  1 

1  O  1 

1  1 

Q 

1  1  1 

i  1 

u  i  t 

z 

1 

1  1  1 

1  1 

1  H  1 

1  1 

z 

<  1  < 

N  1 

1  1  1 

N 

1 

<  1  1 

1  1 

O  O  1 

o  o 

o 

1  1  1 

u> 

1  1 

O  1  o 

1 

t  t  1 

1  z 

1  O  1 

t  1 

5 

1  1  Z 

1  1 

1  i  1 

< 

1 

1  1  1 

o 

o 

o 

N  O 

N  O 

K 

»-  C 

M  C 

M 

Z  O  0  o  o  o 

z  o  o  o  o  o  O 

z  o  o 

H  o  o  O  O  O  O 

HOOOOOOO 

N  o  O  o 

VIOONOO- 

4AOONOOO  — 

10  O  O  o 

2  X  a  z  o  z 

Z  Z  Z  Z  O  Z  z 

2  2  0 

Z  p-  O  p-  o  «- 

p-  *-  O 

MM  M  O  PH 

MM  M  o  4  p- 

M  M  O 

4  O  O  Z  O  C  o 

^OOZOOOO 

Z  O  O  o 

•aazz>242 


5  ON® 

1-5  ^  ^  ^ 

X  3  9  ^  ^ 
2  s  o  o  o 
H  c  III 
UJ  <  <  < 
(A  O  O  O 

-  fn  fo 

o  o  o 

o  o  o 


O'  o  •-  r\  v 

a  cr  lA  «*•  10  tf\ 

it  9  9  9  9  :t 

o  o  o  o  o  o 

I  I  I  I  I  J 

<  4  <  <  <  < 

o  o  o  o  o  o 
o  o  o  o  o  o 

K1  <0  O'  (0  (A  O 
fO  fkj  OJ  (VJ  (V  <N| 

o  o  o  o  o  o 
o  o  o  o  o  o 


tta22>Z42Z 

ONOOO'-AJIAO^O 

asoaiAcAcAiAirtiA 

9999999999 

oooooooooo 

I  I  I  I  I  I  I  I  I  I 

44a«a<<<<i«i4< 

oooooooooo 

oooooooooo 


•T3 

u 

o 


4. 

o  (i. 


4 

< 

■) 

►H 

->  MM 

sS 

1 

W  1  t 

fV  N 

SH 

UJ  >  *-  “) 

UJ  >  M  ->  SC 

N  z  PH  ■)  (M 

9< 

1  H  Z  O 

1  1-  Z  O  *«»  PH 

O  1  X  O  O' 

5  z 

10  <  1 

Z 

PH  CO  <  1  z  z 

O  PH  4  1  lA 

•  Hi 

X  2  <  Z  1  Z 

X 

X  Z  4  Z  1  2  X  X 

X  O  2  1  Z  C 

SJ: 

O  UJ  1  X  >  (0  z  z 

O  O  Z  >  10  o 

1  Z  J  U  N  • 

N 

1  Z  J  O  N  •  H  1- 

1  lA  Z  1-  p  - 

o 

Z04X<0300 

Z  -  O  lO  J  - 

S!?z 

O  Z  2  <0  O  3 

OZZ10O3P-P- 

O  c  Z  O  3  o 

J-  Z  O  3  al  Z 

Q 

NZ03ZZ00 

K  W  Z  Z  Z  — 

kq:  .  > 

O  10  PH  (0  Z  Z 

< 

->OlOPHlOZZ44 

O  Z  Z  Z  iA 

t  *  ' 

^  3  10  O  A  PH 

z 

#  1  J^DlOOlOpHQCtc 

iC  3  z  PH 

O’  O 

O  O  O  O  O  o 

o 

ooooooooo 

O  O  O  o 

o  o  o  o  o  o 

W’ 

ooooooo--  — 

o  AGO 

<0  ^  • 

•  ••••• 

o  «  w 

9 

•  1  '  •  •  1 

-  1  O  ®  CW  P 

o  M  o  o  o  o  o 

00-000000 

O  1  -  O  o  1 

ogj  Mazz>z< 

Z 

-ZZZ>Z4ZZ 

Z  1  Z  Z  4  1 

O 

o 

o 

o 

z 

Z  Z 

z 

z 

4  M 

z 

z 

z 

M  CC 

M 

M 

z 

O  Z 

B 

B 

h- 

z 

N  PH  z 

3 

z 

3  Z  Z 

■ 

o 

C  I  — 

2 

2 

52^ 

o  • 

,15 

15  O  15 

O  >D 

O 

Z  Ui 

z 

Z 

Z  V  Z  Z 

M  2 

M 

3 

PH  Z 

z 

►HZ  HZ 

N  M 

N 

Z  4 

O  J  z 

— 

<0  3  a 

U  O 

O 

O 

X  4  4  4  10  Z  O 

O  X 

4  4  4  <0  Z  O  O 

Z  M  <  i/)  Ul  M 

N  D  N 

O  Z  Z  O  Z  10  PH 

HOZZOZIOphm 

o  O  Z  Z  <0  o 

10  o  *" 

M  u  (5  4 

J  o 

—  M 

Z  O  4  3  0  0 

M  UJ  Z  4 

J  z 

Z  a  P  2 

X  >  PH  o  Z 

3  4 

•  2  I 

>  M  o  Z  3  4  4 

X  z  >  u 

3  Z 

U.  # 

>  4  10  H  Z 

Z  Z 

U  PH  * 

4  <0  1-  £  Z  Z  Z 

B-Z  4  £  Z  Z 

5-77 


A  Segment  of  a  Subordinate  Clause 
Fig,  5-18 
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A  Segment  of  a  Sobordlnate  Clatiaa 
Fig.  5-19 
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There  ie  one  other  problem  in  clause  identification  that  nill  be 
disousBod*  Some  words,  such  as  qro  oau  int^sect  with  both  the  relative 
conjunction  and  relative  pronoun  predictions .  If  qro  is  used  as  the  relative 
pronoun, then  it  is  also  the  subject  or  the  object  of  the  clause.  Host  of 
the  time  uto  ie  used  as  a  relative  conjunction,  so  that  the  prediction  for 
relative  conjunction  is  placed  higher  in  the  pool.  If  uto  is  used  as  a 
relative  pronoun  and  is  the  subject  of  the  clause,  no  subject  would  be 
found  in  the  clause  as  in  the  example  uto  MCKjacuaeT  Horoxeroie 
(Fig.  5-19) •  On  the  next  pass,  the  relative  pronoun  alternative  argument 
will  be  seleotod  as  the  preferred  argument. 

8 .  The  Conjunction  h 

Only  one  use  of  the  conjunction  k,  namely  its  use  as  a  link  to 
connect  two  similar  words,  has  been  considered  in  the  predictive  syntactic 
analysis  program  so  far.  The  linking  property  can  be  expressed  completely 
in  terms  of  predictive  analysis.  M  may  link  the  word  following  it  with  any 
word  located  in  a  nested  structure  that  has  not  been  completed.  This  la 
illustrated  schematically  in  Fig.  5-20.  Parentheses  have  been  placed  around 
every  nested  structure.  The  parentheses  have  been  numbered  and  the  right- 
parentheses  have  been  marked  with  primes. 

Preceding  the  m,  the  sentence  has  one  clause  indicated  by  left 
parenthesis  1.  Within  the  clause,  the  subject  noun  phrase  has  been.com- 
pletely  analyzed  (parentheses  2  and  2' ),  while  the  predicate  verb  phrase  is 
still  open  (parenthesis  4) •  A  participial  phrase  has  been  completely 
analyzed  (parentheses  3  and  3')j  and  a  prepositional  phrase  that  is  part  of 
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Schematic  Representation  of  a  Sentence  with  n 
Fig.  5-20 


the  verb  phrase  is  open  (parenthesis  5)*  The  word  following  m  can  be  linked 
to  either  one  of  the  words  within  the  prepositional  phrase  or  to  one  of  the 
words  in  the  verb  phrase  of  the  clause;  however,  the  word  following  m  cannot 
be  linked  to  any  word  of  the  participial  phrase  or  the  subject  noun  phrassj 
since  those  nested  structures  have  already  been  completed.  If  the  link 
turns  out  to  be  with  a  word  in  the  verb  phrase,  then  the  prepositional  phrase 
is  considered  completely  Identified. 

If  a  prediction  for  a  possible  link  is  made  by  inserting  a  prediction 
for  a  compound  subject,  compound  noun  complement,  etc.,  into  the  prediction 
pool,  then  the  addition  of  an  end  wipe  prediction  directly  below  the  compound 
prediction  identifies  the  nested  structure.  Since  the  compound  prediction 
cannot  be  fulfilled  except  by  a  word  following  ann,  the  predictions  are 
made  inactive  by  means  of  PSI  =  99-  When  an  m  is  identified,  these  pre¬ 
dictions  are  activated  for  a  single  cycle,  that  is,  for  the  testing  cycle 
of  the  next  word.  If  an  activated  compound  prediction  is  not  fulfilled  and 
is  not  wiped,  then  it  is  deactivated  when  the  prediction  pool  is  updated. 

This  activation  and  deactivation  process  is  carried  out  completely  with  the 
modification  of  the  prediction  span  indicators .  A  PSI  of  99  indicates  that 
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the  predlotlon  is  inaotive  and  49  indloates  that  the  prediction  has  been 
activated  for  one  testing  cycle,  after  nhioh  it  is  reset  to  99-  . 

la  the  prediction  pool,  the  compound  predictions  are  ordered  in 
the  inverse  order  of  oocurranco.  The  lost  compound  prediction  made  is  the 
one  nearest  the  top  of ’the  pool  and  viill  be  tested  for  fulfillment  first. 

Several  examples  from  text  material  TfiU  be  presented  indicating 
several  types  of  problems  Involved  in  the  identification  of  the  linkage 
between  the  words. 

A  simple  example  of  compounding  is  shown  in  the  participial  phrase! 
noaBOjinrat(n«  stmejniTb  ooBOBHyx)  uaoroTy  nepnofluueoKoro  cMTHajia  n  iiaMeFMTb,.. 
(Fig.  5"21).  The  infinitive  verb  Btmejmrb  is  identified  as  the  verb  master 
of  the  participle  nuaBOJWKopie .  The  attributed  argument  makes  a  prediction 
of  a  compound  verb  master  which  is  eventually  activated  when  the  conjunction 
M  is  Identified.  The  word  following  then  is  also  a  verb  infinitive, 
MSMepnrb,  and  the  only  intersection  is  with  the  now  activated  compound  verb 
master  prediction.  Meanwhile,  the  nested  object  noun  phrase  string, 
ocHDBHyro  uacTOTy  nepnoflJwecKoro  CMrHajia,has  been  recognized.  Since  the 
linkage  of  waMepnrb  goes  beyond  the  object  string  to  atmemurb,  the  identi- 
fication  of  the  noun  phrase  is  completed. 

Consider  next  the  prepositional  phrase: , . .ks -nenoqKM:  saneprux 
ii/QrjibTMBiidpaTopoB  M  yiipaBjiHiamefi  cxeMu  (Fig.  5“22).  The  noun  menonKW  is 
identified  as  the  preposition  complement  of  ns,  after  which  the  noun  phrase, 
aanepTKX  jjyjrbTMBwdpaTopoB^is  identified  as  the  noun  complement  of  nenouKii . 

A  compound  preposition  complement  in  the  genitive  case  is  predicted  by  the 
attributed  argument  of  nenouKH,  and  a  compound  noun  complement  .(obviously  in 
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the  genitive  case)  is  predicted  by  the  attributed  argument  of  aaneprux. 

These  two  predictions  are  activated  by  the  conjunction  m  . 

When  the  four  alternative  arguments  of  ynpaBJinwmeii  are  tested  against 
the  predictions  in  the  pool,  the  two  compound  predictions  are  at  the  top  of 
the  pool.  Since  both  predictions  can  be  fulfilled  by  a  genitive  adjective 
alternative  argument,  the  attributed  argument  is  compound  noun  complement, 
while  the  second  intersection  of  compound  preposition  complement  is  noted 
on  the  hindsight  file.  Finally,  cxewH  is  Identified  as  the  master  of  the 
compound  noun  complement  ynpaBjiAwmeii,  There  is  no  way  of  determining  on  the 
basis  of  a  syntactic  analysis  whether  ynpaB;iflK)m6^  cxeMU  is  the  compound  noun 
complement  or  the  compound  preposition  complement.  A  second  pass  would  have 
to  recognize  that  this  ambiguity  exists  and  list  both  interpretations  as 
possible  ones. 

Another  prepositional  phrase  brings  up  another  Interesting  difficulty? 
, . .  npnse^yr  k  HapynieHMo  KOMneHcauim  paBHOBepoOTHux  nojioxcMTejibHbK  m 
oTpHuaTejibHHx  BiidpocoB  rnywa. (Fig.  5-23).  The  noun  Hapyujerono  is  identified 
ss  the  prepositiona]  complement  of  k ,  after  which  the  alternative  arguments 
of  KowneHcaurai  are  tested  against  the  predictions  in  the  pool.  Because  of 
the  ordering  of  the  predictions  in  the  pool,  the  argument  attributed  to 
Koi.meHcaujiM  is  the  noun  complement  of  Hapymeiorio,  while  the  second  possible 
intersection  of  object  of  the  verb  npuseflyr  is  noted  on  the  hindsight  file . 
Here,  once  more,  is  an  example  of  an  ambiguous  situation  which  cannot  be 
resolved  by  means  of  a  syntactic  analysis  alone. 

In  any  event,  the  string  paBHOBepoHTHHx  nojioscuTejibHtK  is  identified 
as  part  of  a  noun  phrase  acting  as  a  noun  complement  of  KOMnencaip®'! .  At  this, 
time  the  following  predictions  are  at  the  top  of  the  pools  i 
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(1)  Master  (of  noflOJciTembHia ) 

(2)  Compound  noun  complement  (of  paBHOBeponmnc ) 

(3)  Compound  no\m  complement  (of  KOMneHcau^MM ) 

(1^)  Conpound  preposition  complement  (of  Hapyraemno) 

(5)  Object  (ofnpMBeflyT) 

The  first  of  the  three  alternative  arguments  of  oipMrtaTejEbHux  intersect 
with  the  first  three  predictions  in  the  pool,  and  the  attributed  argument 
is  master  of  nomoxMTejniHtK .  Since  the  noun  phrase  has  not  been  completed, 
oTpnrpTejibnLK  is  linked  either  with  paBHOBepoflTHHX  or  nojioaMTejibHbD^  How¬ 
ever,  the  distinction  cannot  be  drawn  on  syntactic  lines  and  a  mistake  can 
occur. 

The  remainder  of  the  exanples  v/ill  be  concerned  with  compound  sub¬ 
jects  and  predicates.  The  first  example  is  a  compound  predicate  following 
the  subject:  rainy jacm  $opMJQ)yBTOH  6y$epHtniM  maMnawii  m  noflaioTcn 

(Fig,  5-21;).  HoflaroTcn  agrees  with  ^opMnpyroTCH  in  person,  number,  tense, 
and  voice,  and  thereby  intersects  with  the  compound  predicate  head  predic¬ 
tion  generated  by  the  attributed  argument  of  ^opMnpyroTCH ,  The  object  noun 
phrase,  6y$epHUMM  jiawnaMW,  is  then  identified,  and  the  process  is  terminated, 

A  somevrhat  more  interesting  example  is; , . ,  wto  cHHxpoHHbiJi  $iijibTp 
npMTOflen  m  fl^eT  (Fig,  5-25),  in  which  the  indicative  verb  fflaeT  is 
compounded  with  the  short  form  adjective  xipMTOfleH,  which  is  acting  as  the 
predicate  of  the  clause. 

If  the  subject  is  a  compound  one,  and  if  the  predicate  head  predic¬ 
tion  has  already  been  modified  to  accept  only  a  singular  predicate,  then  it 
is  necessary  to  modify  the  predicate  head  prediction  again,  so  that 


A  Segment  of  a  Sentencb 
Fig.  5-2]* 
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the  prediction  can  now  accept  a  plural  pi-odicate  oniy.  For  eccample,  in: 

oTcyrcTBMe  HOjniHefiHHX  8$$8KTOB  K  nocTOflHCTBo , .  .odecncsiHBaOTCw  ,  .  , 

(Fig.  5-26),  when  oxcyrcTBiie  is  recognized  as  the  subject  of  the  clause, 
the  predicate  head  prediction  is  modified, so  that  a  predicate  must  agree 
with  the  subject  in  number.  When  nocronHOTBO  is  later  recognized  as  the 
compound  subject,  the  prediction  is  modified  once  more, so  that  it  can  only 
accept  a  plural  predicate  such  as  oCocnetiMBajorcn. 

Although  M  and  hjim  have  been  tempoitarlly  grouped  together,  it  is 
obvious  that  in  the  above  case  they  must  be  treated  differently.  If  the 
clause  had  read  ...mjik  nooTOwHCTBo  instead  of  ...m  nDcrojmcTDO,  .then  the 
predicate  head  should  not  have  been  modified  to  accept  a  plural  predicate. 

If  the  predicate  precedes  the  compound  subject,  it  is  usual  to  find 
the  predicate  agreeing  with  the  first  subject  as  in  the  example, 

...HBjiHeTcn  aHaJiii3...K  paccuoTpeHne  (Fig.  5-27).  It  is  possible  to  find  a 
preceding  predicate  written  in  the  plural  rather  than  the  singular.  This 
possibility  will  have  to  be  incorporated  in  the  predictive  syntactic 
analysis  program. 

9.  Summary 

Several  of  the  broad  problem  areas  of  syntactic  analysis  which  hare 
been  included  in  the  experimental  program  have  been  discussed.  These  problem 
areas  basically  are  concerned  with  the  syntactic  relationships  among  indi¬ 
vidual  words  within  phrases  and  clauses,  as  opposed  to  the  syntactic  relation¬ 
ships  among  the  phrases  and  clauses  themselves.  In  this  section,  a  number 
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of  othor  jireas  that  ei^er  are  being  etudled  at  this  time  or  might  be 
studied  in  the  near  future  are  mentioned. 

In  the  present  experimental  program,  the  government  of'  oblique  ‘ 
oases  by  nouns,  adjectives,  and  verba  has  been  largely  neglected.  To  some 
extent  this  has  been  due  to  a  previous  deficiency  of  dictionary  coding, 

g 

which  has  only  recently  been  rectified. 

Another  important  area  that  is  being  considered  is  the  negative, 
in  particular,  the  word  h8  .  If  ne  occurs  Immediatelj’’  preceding  a  transitive 
verb,  the  object  governed  by  the  verb  can  be  in  the  genitive  case  rather  than 
the  accusative  case.  Hoiiever,  if  an  adverb  intervenes  between  the  verb  and 
the  object,  the  object  must  remain  in  the  accusative  case. 

The  first  attempts  at  predicting  entire  phrases  and  clauses  are  being 
made .  After  a  comparative  adverb,  a  clause  or  phrase  starting  with  weu  will 
be  predicted.  Every  noun  will  predict  a  modifier,  a  prediction  that  is 
normally  inactive  but  which  is  activated  after  a  comma.  This  prediction  will 
be  fulfilled  by  a  participle  which  occurs  after  the  comma  and  agrees  with 
the  noun  in  case  and-  number.  This  same  mechanism  might  prove  useful  in  the 
identification  of  series  of  words  separated  by  commas.  In  the  near  future, 
some  prepositional  phrases  should  be  predicted  by  verbs .  It  is  estimated 
that  approximately  half  of  the  prepositional  phrases  that  are  found  could 
be  tied  in  this  manner  to  a  verb  that  they  modify. 

The  broadest  and  most  important  area  for  future  research  is  in  the 
organization  of  correcting  passes.  These  are  necessary,  on  the  one  hand, 
to  correct  errors  discovered  during  the  first  pass  and,  on  the  other  hand, 
to  establish  the  syntactic  relationships  among  the  phrases  and  daises. 
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For  the  reader  who  wishes  to  try  to  analyze  Russian  sentences  syn¬ 
tactically  by  means  of  the  technique  of  predictive  analysis,  several  sen¬ 
tences  have  been  provided  in  Appendix  F,  All  the  words  in  these  sentences 
have  been  analyzed  on  a  word-by-word  basis  (see  Chap,  3)*  "Ehe  grammatical 
codes  necessary  to  carry  out  the  syntactic  analysis  are  listed  in  Tables 
3-kf  3-5»  3-7j  3-8>  3-9#  and  3-12.  The  complete  details  of  the  coding  in 
the  texthadic  items  can  be  found  in  Ref,  6, 
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Ippsndlx  1 

NOTATION  FOR  SEQUENTIAL  OEKRATIONS 

1  2 

Iverson  *  has  proposed  a  new  notation  for  sequential  operations 
that  is  extremeljr  useful  over  a  wide  range  of  data  prooessing  problems. 
The  diffioulty  in  expressing  logical  processes  of  automatic  translation 
in  olasaical  forms  of  representation  led  to  the  adoption  of  this  more 
powerful  notation  with  minor  modifications.  Iverson's  notation,  as  used 
in  this  report,  is  presented  in  this  appendix.  Only  the  operations  used 
in  this  report  ai*e  given. 

A  representative  set  of  sequential  operations  is  illustrated  in 

Fig.  A-1 


Sum  of  Integers  from  a  to  b 
Fig.  A-1 

Each  line  of  the  set  of  operations  is  a  step,  a  specification  of  some 
quantity  or  quantities  in  terms  of  some  finite  operation  upon  a  specified 
set  of  operands.  Thus,  “s  is  specified  by  the  sum  of  the  contents  of  s 
and  x"  is  denoted  by  step  4*  At  certain  branch  po.int5  in  the  program  more 
than  one  alternate  step  is  specified  as  a  possible  successor.  One  of  these 
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poBoible  euooeasors  is  ohosen  aooordiog  to  oritorla  determined  in  the  stop 

preceding  the  branch.  The  branch  is  denoted  by  a  set  of  arrows  leading  to 

« 

each  possible  successor  stop,  and  each  arrow  is  labeled  by  the  condition 
under  which  the  corresponding  successor  is  ohosen.  The  stop  following  the 
branch  step  is  selected  if  none  of  the  labeled  conditions  is  net.  In 
addition,  any  unlabeled  arrow  is  always  considered  an  unconditional  transfer. 
Thus,  in  step  5»  if  x  is  equal  to  “b",  the  arrow  is  followed,  and  the 
process  tormlnatos.  However,  if  x  is  not  equal  to  "b",  the  process  contlnuso 
to  step  6.  After  performing  step  6,  the  operation  always  returns  to  step  4» 
Consider  the  program  in  Fig.  A-1.  It  is  a  representation  for  the 
program  to  sum  all  the  Integers  from  ”a"  to  "b".  In  steps  1  and  2,  s  and  x 
are  initialized  to  "0"  and  "a”  respectively.  If  "a”  >  "b",  the  program 
terminates  at  step  3*  The  adding  operation  takes  place  in  step  4*  If  x  Is 
not  equal  to  "b"  in  step  the  program  continues  to  step  6,  after  which  it 
returns  to  step  4*  The  symbol, xj,  represents  the  first  successor  of  x,  that 

is,  X  +  1.  (A  similar  symbol,  x|,  represents  the  first  prodeoessor  of  x.) 
This  process  is  continued  until  x  =?  •’b“,  when  the  an*ow  terminating  the 
program  will  be  followed. 

Since  zero  occurs  frequently  in  oomparlsonB,  it  is  convenient  to  omit 

it.  Thus,  if  a  variable  stands  alone  at  a  branch  point,  compwison  with  zero 
is  implied.  Moreover,  since  comparisons  on  an  index  frequently  occur 
Immediately  after  it  is  modified,  a  branch  at  a  point  of  modification  will 
denote  branching  on  the  indicated  index,  the  comparison  occurring  after 
modification. 


Scalars,  vectors,  and  matrices  will  be  used  in  the  operations  and 
will  be  indicated,  respectively,  by  lower  case  letters  (x),  lower  case  letters 
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underlined  (x),  and  upper  ease  letters  underlined  (X) .  Coopononts  of  a 
row  VQotor  will  bo  indicated  by  a  subsoript,  while  oonponento  of  a 
column  ’'sotor  will  be  indicated  by  a  superscript,  x^.  i  vector  will  be 
assumed  to  be  a  row  vector  unless  otherwise  specified.  Bows  of  a  matrix 
will  be  designated  by  a  superscript,  I^,  and  columns  by  a  subsoript,  Xj» 

One  general  restriction  on  operations  is  the  need  for  compatibility  of  the 
operands.  Compatibility  conditions  (shown  in  column  4  of  Table  A-1) 
concern  the  dimensions  of  the  operands  (v(x),  1^(1), /i(X))  and  in  most  cases 
the  dimensions  must  be  equal. 

The  weight  of  a  logical  vector  x,  that  is,  a  vector  every  component 

of  which  is  a  "1“  or  a  “0",  is  denoted  by  cr(x)  and  is  defined  as  the  number 

of  unit  components  in  x.  The  logical  head  vector  h'^  contains  a  "P  in  the 

first  J  positions,  and  the  logical  tall  vector  t^  contains  a  "1“  in  the 

+h 

last  J  positions.  A  logical  vector  with  but  one  "1"  in  the  position  is 
denoted  by  and  a  logical  vector  x  with  cr{x)  a  j/(x)  is  denoted  by  f  . 

A  list  of  the  operations  that  are  used  in  this  report  follows. 

The  operations  are  summarized  in  Table  A-1. 

1.  Scalar  replacement t  If  c  is  a  scalar  and  u  is  a  logical 
vector. and  x  =  ou,  then  Xj,  =  cu^.  Thus,  if  u  ~  l,0,l,0,lj,  then 
X  =  1^0,0, c,0,oj . 

Rogation;  If  u  is  the  negation  of  u,  then  u^  =  0  if  a  1 
and  a  1  if  a  0.  In  the  example  of  1,  a  »  ou  =  0,o,0,c,0  . 

3*  Logical  sum;  If  w  =  u  v  j,  then  a  u^  v 
Ki)  ^  ‘'(jj)  =  Thus,  if  u  a  [  1,0, 1,0,1  and  V  =  0,0, 1,1,0  , 

then  w  =  [l, 0,1, 1,1  . 
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«  with  tho 

same  oompatibillty  oondltions  as  in  3»  Using  the  same  u  and  v,  w  ^  ^0,0,1,0|0|. 

5.  Compressiom  If  x  =  jj/jr,  then  the  vector  x  contains  oniy 
.  those  components  of  the  vector  £  for  which  3  1,  and  x  is  ordered  on  }[, 

‘'(Z)  =  If  u  =  [l/0, 1,0,1]  and  jr  =  2  =*  [^I*y3»y5]* 

The  compression  operation  is  extended  to  matrices  as  followsi  a  row 
compression,  denoted  by  jj/X,  compresses  each  row  vector  of  the  matrix  X  to 
form  a  new  matrix  of  dimension  ^(X)  x  cr(u).  Column  compression,  denoted  by 
u  /  I,  compresses  each  column  vector  to  form  a  matrix  of  dimension 
o-(u)  X  1/(1).  Compatibility  conditions  are  i'(u)  =  i^(X)  for  row  compression 
and  i^(u)  =  /i(X)  for  column  compression. 

In  the  event  of  compression  by  a  logical  head  or  tail  vector  of  a 
vector  X  such  that  (r(h)  or(r(j;)  >  i/(x),  the  resultant  is  defined  such  that 
~  X  and  j/x  =  x* 

6.  Mesh;  Given  the  vectors  x  and  y  and  the  logical  vector  u, 
the  mesh  of  x  and  y  under  the  control  of  u,  \x,  u,  y\,  results  in  a  vector  z 
such  that  ij/z  =  y  and  ^z  =  x.  ^(u)  =  v{z),  cr(u)  =  and  a-(u)  =  )/(x). 

If  u  =  [1,0, 1,0,1  ,  X  =  2,3  ,  and  y  =  [7,8,9  ,  then  z  =  \x,  u,  y\  =  [7,2,8, 3, 9  . 

7.  Logical  reduction;  If  y  =  x  R  y,  then  v^  3  1  x.  E  y . . 

Any  relation  can  be  substituted  for  "R".  In  particular,  if  "R“  is  then 
for  X  =  1,2,4, 8  and  y  =  [l,3,8,8  ,  y  =  1,0, 0,1  . 

8.  Mapping;  The  mapping  vector  x  =  ^(y  < — z)  is  defined  as  follows; 

0  if  u  =  0 

if  u  0,  where  y  =  [l,2,3... 
where  u  =  1  =  y)  •  ’■'(x)  =  v(z) , 


product;  If  w  3  u  a  y,  then  w^  » 
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Baoh  is  oomparad  with  all  coaponents  of  OQO  or  more  of  the 

oomponents  of  s;  correeponds  with  z^,  =  j  where  ie  the  firet 

oomponont  of  y  that  is  equal  to  Zj. .  If  there  is  no  y^  that  corresponds 
with  z^,  3  0.  Since  vectors  can  have  repeated  oomponents,  it  is 

necessary  to  restrict  the  correspondent  to  the  one  occurring  first. 

For  example,  if  y  =  ["A“,  "L",  «A",  “B“,  “A",  "A**  ]  and 

B  =['»L»,  "A**,  "B",  "R",  “A",  "D",  "0",  "R"],  then  (y  <—  a)  =  [2,l,4,0,l,0,0,o] 
and  (z< — y)  =  [2,1,2,3,2,0,2] . 

The  notion  of  a  file  (4^)  has  been  adopted  to  allow  for  the 
description  of  magnetic  tape  as  a  serial  access  memory.  The  operation  of 
transferring  an  element  from  a  file  is  reading  (x  ** —  4>) ,  and  the  operation 
of  transferring  an  element  into  a  file  is  recording  ($■< —  x) .  A  file  can 
be  read  (or  recorded)  in  the  forward  (denoted  by  or  backward  (denoted 
by  direction,  which  will  be  indicated  by  the  left  subscripts.  Right 
subscripts  are  used  merely  as  indices.  A  file  which  is  only  recorded  in  an 
algorithm  is  an  output  file,  and  a  file  which  is  only  read  is  an  input  file. 

Both  scalars  and  vectors  can  be  read  (or  recorded)  into  files.  In 
any  step,  the  item  to  be  read  (or  recorded)  will  serve  to  identify  whether 
a  scalar  or  vector  is  being  read  (or  recorded).  Thus,  it  is  possible  in 

I 

one  ste  p  -  to  record  a  ve-otor  in  the  forward  direction  on  a  file,  and 
in.  the  next  step,  to  read  a  scalar  in  the  backward  direction.  In 
this  event,  the  last  element  of  the  vector  that  was  recorded  in  the  file 


will  be  read  out. 
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li  Ift  SuoocBBor 
2*  Ist  Fr«dea«BS«r 
3.  Weight 


4*  Heed  reotoar 


5.  Tall  reotor 


6*  Fall  Teotor 


7.  Unit  Tootor 


8.  Identlijr  permutation  veotor 


9*  Soalsr  rapleoeaant 

10.  Negation  (not) 

11.  Logical  eoa  (or) 

12.  Logloal  product  (and) 


13.  Vector  oodapreBoloa 


14.  Natrlx 


.  roT3  oojfipreoBlon 
ooltusn  ooraproBsloQ 


15*  Vector  mesh 


16.  Logical  reduction  "H" 


16a.  Logical  reduction  OB'*  =  “a" 


0  a  h*-l 
0  *  b-1 

a  I-*-*- 1  i  J 
a_.  *  l-*-^  1  <  J 


a  l<-»-  i  a  J 


k 


Dlaenelon  defined 
bgr  eofitest. 


8^  * 


»i  =  “1 '  ■'i 
•i  “  "1-  ■'i 


1^(5)  =  i^(u) 
v(v)  a  I/(a) 
v(v)  3  i/(a)  =  v{r) 
«'(■)  =*  >'(«)  =  v(j) 


€  0  0^  a  1  «'(a)  3  l/(u)5  1/(0)  3  ^(u) 

v{l)  3  v(a)|  1^(0)  =<r(u)} 


Si  =  Vii 


^5  =*  Ip  ya  =*  :sr 


5^  a  1  (r^  E 


^i(C)  3  /x(4) 

/*(A)  3  ^(u)j  ^(c)  3  <T(ji)| 

v{G)  3  1/(4) 

J'(x)  3cr(2){  i/(j)  3  cr(u)} 
v(s)  3  v(2j) 

Hs)  =  v{a)  3  t/(;^) 


3^  3  (a^  a  b^j 


Operations  Used  in  Prograae  in'  This  Seport 
TABLg  &-1 


17.  IdapplDg 

5  ** — M(a 

0^  a  O'*— ►  u  »  0 

vis)  ^ 

where  u  —  (^i  1  * 

) 

18.  Read  from  fUo 

(forward  direotlcm) 

x.-o4> 

19.  Record  onto  file 

(backward  direction) 

I*-* 

TABLE  A**l  (oontlnued) 
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Appendix  B 

EfiBATA  SHEET  FOR  TABLES  IN 


1.  Matejka,  L.,  "Grammatical  Specifications  in  the  Russian-English 
Dictionary,"  Design  and  Operation  of  Digital  Caloulating  Haohinery, 
Report  No.  AF-50,  Harvard  Computation  Laboratory,  Section  V  (1958). 

a.  p.  V-21j  In  class  NI.4,  the  "Gp"  and  "GpAp"  listed 
for  affix  -on  should  be  listed  instead  for  affix  -en. 

b.  p.  V-30j  Line  (11)  should  contain  "GpApPp"  wherever 
there  is  "GpAp". 

c.  p.  V-3I1  Line  (7)  should  contain  "NsAs"  in  column  "H" 

of  "A6+A7+A8"  with  an  asterisk  to  a  note  stating  that  this 
refers  to  class  A8  only. 

d.  p.  V“31j  Line  (25)  should  contain  "GpApPp"  wherever 
there  is  "GpAp". 

2.  Matejka,  L.,  "The  Automatic  Interpretation  of  Russian  Verbal 
Endings,"  ilathematioal  Linguistics  and  Automatic  Translation. 

Report  No.  NSF-2,  Har/ard  Computation  Laboratory,  Section  III  (1959). 

a.  p.  III-16: 

(1)  The  entry  in  "V4.02",  "y",  should  be 
"A"  insteaa  of  "Clsl". 

(2)  The  entry  in  "V4.I",  "w",  should  be 
"B5"  instead  of  "A". 

(3)  The  entry  in  "V4.ll",  should  be 
"B5"  instead  of  "A". 

b.  p.  III-17s  The  entry  in  "V5.3",  "ot©”,  should  be  "A" 
instead  of  "Clp2". 


c.  p.  III-19s 


(1)  Tho  entry  In  ''VI?”,  "a",  ehould  inoludo  "B5”  in 
addition  to  "B3p" • 

(2)  The  entry  in  "719",  "ara”,-  should  not  be 
ahaded  in. 

(3)  The  entry  in  "V15.1",  should  bo  "B3" 
instead  of  "A". 

(4)  The  entry  in  "V19"»  "o">  should  include  "B5" 
in  addition  to  "D3fs". 
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GENERATING  AFFIX  -  AFFH  KAPHNO  FOR  THE  EXPERIMENTAI  DICTIONARY 


C-1 


The  first  two  columns  of  the  tables  in  this  appendix  are  reproduced 
frcTO  Magassy’s  original  paradigm  tables#^  The  generating  affixes  are  listed 
for  all  but  the  canonical  fom» 

The  affixes  that  are  factored  in  the  place  of  each  generating  affix 
of  each  class (Sec. .3.3)  are  listed  in  columns  3  and  1|.  If  the  stem  remain¬ 
ing  after  the  affix  is  factored  is  the  same  stem  that  remains  after  most  or 
all  of  the  affixes  of  the  paradigm  are  factored,  then  the  affix  is  listed 
in  column  3.  Otherwise,  the  affix  is  listed  in  column  1|«  The  affixes  in 
the  fourth  column  are  a  result  of  false  factoring  (Sec.  3-2B), 

No  attempt  has  been  made  in  these  tables  to  include  infomatlon  on 

vowel  insertion  or  vowel  substitution  in  the  generating  stems.  Reference 

2 

for  such  infoimation  should  be  made  to  Oettinger's  updated  tables. 

If  an  affix  appears  twice  in  a  class,  each  occurrence  is  identified 
with  an  asteilak,  denoting  the  presence  of  a  potential  artificial  affix  homo¬ 
graph  in  the  experimental  di-ctionary, 
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Appendix  D 

FLOW  CHARTS  FOR  ANALZZER  PROC51AK3 
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AdjectlTe  Analjaer  Progs*am 
Char't  No,  1 
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Flow  Chart  D-2  (continued) 
Chart  No,  5 


Note  Ij  Insert  a  ”1"  into  character  position  8  of  the  organised  word. 
Note  2:  Insert  a  "2"  into  character  position  8  of  the  organized  word. 
Note  3s  Insert  a  "3"  into  character  position  8  of  the  organized  word. 
Note  lj.{  Insert  a  "1"  into  character  position  9  of  the  organized  word. 
Note  Insert  a  "2"  into  character  position  9  of  the  organized  word. 
Note  6:  Insert  "Ns  and  As"  only  if  affix  of  canonical  form  is  on. 

Note  7  s  Mark  "IWCOMPAT  EE"  in  word  2li. 


.9 


Ch^  1>3  (continued) 
Chart  No®  5 
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13.00  • 

■  s 

•K 

636 

617 

10 

EM 

01  .00 

;j9 

IS 

34 

02.00 

9 

0 

c 

o 

fO 

c 

'0 

in 

04,00  ♦ 

472 

21  1 

21  1 

06.00 

1  1 

1  1 

10.00 

319 

31° 

12.00 

4 

n 

13.00 

6 

6 

830 

574 

256 

EMU 

01  .00 

8 

(I 

4 

04,00 

2''0 

1  15 

115 

07.00 

7 

7 

245 

1  19 

126 

HT 

oi  .00 

179 

66 

04,00 

6 

3 

72 

22 

06.00 

6 

6 

08,00 

4 

n 

167 

70 

97 

ET  SJA 

O]  ,00 

1 

I 

1 

1 

ETE 

01  ,0O 

73 

1 1 

12 

04,00 

2 

1 

1 

13.00 

1 

1 

26 

12 

14 

TABLE  E-l{d)  (continued) 


AFnX 


CLASS 

MASKER 


{  >  J 


f. 


OE 


OJ 


OH 


OHU 


T» 


UT  SJA 
UJU 


UJU.  S » 
Y 


YE 


YMI 

YX 


OUOO 

02*0r' 

04.31 

06.00 

01.00 

01.00 

01.10 

01.30 

02.00 

04.00 

04.0? 

04.10 

04.30 

04.31 

06.00 

01.00 

01.10 

01.20 

01.30 

02.00 

03.0* 

03.10 

04.00 

04.10 

06.00 

OS.OO 

oa.i« 

09.9® 
01.00 
01.10 
02.00 
04.00 
01.00 
C6.00 
10.00 
01.00 
01.10 
01.20 
01.30 
03.0? 
03.10 
04.00 
04.10 
04.30 
04.31 
06.00 
06.1* 
99.9® 
01.00 
05.0* 
06.00 
01.00 
01. in 
02.00 
06.00 
04.00 
01,00 
01  ,20 
03.0? 
04,00 
04,0* 
06,00 
02,00 
04,00 
06.00 
06.00 
02,00 
04.00 
06.00 
02,00 
04,00 
02,00 


SABLE  I 


AFFIX 

1  PER  CLASS  MARKER 

1  PER  AFFIX 

TOrAL 

COAlPATOie 

mCOUMTiSce 

1  total 

CONfATIOLC 

inooumTiQLC 

# 

01  .00 

3A9 

36« 

mm 

'69 

A 

nl  ,0^ 

1U5 

14*^ 

AH 

01  .0^ 

>l 

21 

■Bl 

AH  I 

01. 

'6 

36 

H 

AJA 

01.00 

205 

205 

205 

V  SJA 

01 .00 

«0 

10 

10 

10 

E 

01.00 

2«7 

257 

257 

257 

E  S» 

01 .00 

•  2 

12 

12 

12 

EGO 

01  ,0O 

4^2 

462 

462 

462 

EE 

01  .00 

414 

41U 

414 

414 

EJ 

01  .00 

05 

85 

85 

85 

EH 

01.00 

106 

IB6 

166 

186 

FHU 

01  .00 

'5 

35 

35 

35 

1 

01.00 

106 

196 

196 

196 

IE 

01  .00 

fO 

70 

70 

70 

tJ 

01  .00 

7 

7 

7 

7 

in 

01  .00 

234 

234 

234 

\Hl 

01.00 

49 

40 

49 

49 

IX 

01  .00 

7«1 

7S1 

01  lOo 

01 

81 

862 

862 

0 

01,00 

1BU4 

1844 

1844 

1  844 

OV 

01.00 

2 

2 

2 

2 

OGO 

01  ,00 

4*^1 

471 

471 

471 

OE 

01 .00 

171 

171 

171 

OJ 

01 .00 

506 

585 

I 

566 

585 

1 

OM 

01 .00 

4«7 

447 

447 

447 

OHU 

01  ,00 

74 

7U 

74 

74 

U 

01  .00 

'5 

35 

35 

35 

UJU 

01  .00 

«8 

4« 

46 

4B 

Y 

ol  ,oo 

306 

306 

306 

'06 

YE 

01.00 

IPO 

180 

160 

180 

YJ 

01  ,0O 

107 

107 

107 

107 

YM 

01,00 

’6 

36 

36 

36 

YHI 

ol  ,oo 

74 

24 

24 

YX 

Ot  ,oo 

.203 

205 

202 

1 

( 

Ol  ,00 

2 

2 

2 

JU 

01  .00 

57 

27 

27 

JA 

01 ,00 

7 

7 

7 

R225 

8P23 

2 

I' 


Frequency  of  Reference  to  Pronominal  Dictionaiy  Entries 

TABLE  B-l(e) 


AFFIX 


PER  CUSS  MARKER 


total  iOOriAATiflU  'ACOUPATOlC 


PER  AFFIX 


TOTAL  I  COUrAliSLL 


?63fe  2005  631 


372  ?70  1 02 


UT  SJA 


380 


271 


T09 


AFFIX 


PCH  CLASS  MARKER 


PER  AFFIX 


CLASS 


1  MAAKCR 

roTAC 

COUMTlILI 

iNcouMniu 

TOTAL 

COtMTUlL 

t  IKCOWMTeLe 

R 

a 

4 

1 

1 

1 

1 

16*0^ 

6 

6 

34 

22 

12 

UJU 

01 .00 

6 

6 

04.00 

2 

2 

a 

a 

UJU  s* 

03,00 

3 

3 

3 

Y 

01.00 

143 

l-IS 

03,00 

39 

38 

04.00 

16 

56 

04.01 

31 

31 

04.10 

1 

1 

06.20 

3 

3 

274 

2711 

VE 

01  .00 

12 

12 

04,00 

12 

12 

24 

24 

YJ 

oi.oo 

6 

6 

6 

6 

YH 

01  .00 

>2 

32 

04.00 

4 

4 

36 

36 

YHI 

01  .00 

16 

16 

04.00 

3 

3 

21 

21 

YX 

01 .01' 

14 

U 

04,00 

25 

25 

04.01  , 

2 

2 

‘M 

41 

t 

01  ,oo 

3 

3 

03.00 

9 

9 

04,00 

7 

7 

■  * 

04,20 

1 

1 

00.20 

3 

3 

09.00 

4 

u 

09,10 

3 

11.10 

1 

I 

?l  .OA 

2 

? 

99,99 

5 

3 

36 

12 

24 

JU 

01  .00 

14 

1? 

2 

04.01 

30 

1 

29 

06,10 

1 

1 

45 

31 

JUT 

01. OA 

415 

317 

96 

03.  OA 

125 

90 

26 

12. OA 

?4 

24 

15.20 

1 

1 

16.00 

32 

32 

597 

157 

JUT  SJA 

01  ,00 

533 

44^ 

190 

03,00 

42 

40 

2 

12.00 

17 

1? 

5 

15. 2A 

2 

2 

10. vO 

14 

14 

708 

213 

JA 

01 .00 

10 

u 

6 

03.00 

32 

24 

B 

04,00 

25“5 

7^ 

2473 

04,01 

142 

? 

140 

04,02 

2 

1 

1 

04,20 

1 

1 

05.40 

2 

1 

1 

06.10 

1 

1 

06.20 

1 

1 

08.20 

10 

o 

1 

14,00 

1 

1 

16.00 

1 

\ 

20.00 

32 

32 

?7B1 

117 

2664 

JA  S» 

03.00 

10 

Ifl 

16 

18 

JAH 

01  .00 

1 

1 

04.01 

20 

20 

21 

21 

JAHI 

01  ,00 

4 

4 

03.00 

1 

1 

04,01 

15 

15 

20 

20 

JAT 

01  .00 

1 

1 

04,00 

06 

42 

4 

04.01 

1 

1 

04*02 

4 

U 

06.10 

16 

16 

06,20 

17 

17 

OB.OO 

1 

1 

66 

80 

6 

JAT  SJA 

Ql.OO 

10 

10 

04,00 

82 

44 

38 

04.01 

7 

2 

04.0? 

7 

106 

56 

50 

JAX 

01,00 

51 

51 

04,01 

26 

26 

77 

77 

JAJA 

01,00 

66 

19 

04,00 

32 

32 

04.01 

19 

19 

117 

■a 

70 

jajas» 

01.00 

2 

B 

2 

B 

22265 

10200  ] 

2065 

L 


Houna 

Affix 

Frequency 

- 1 

Percent. 

CvuKulatiye 

Pe2*oente/?0 

Affix 

Frequency 

i— . . 

i’eroent 

01010101170 

Percentage 
. . 

h$72 

iao3 

3^75 

3106 


a:>*C 

27,8 

39.2 

h9,l 

J57.8 


98.2 
98, li 
98.6 
98.8 
99.0 


90.0 

91.3 
92.1i 

93.1 
93,8 

9i4.5 

95.1 

95.7 

96.2 

96.6 

97,0 

97.3 
97.6 

97.8 
98.0 


Frequency  of  Refex"ence  to  Noradjml  Dictionary  Entries  by  Affix 
(both  Compatible  and  Inccsnpatible)  in  Frequency  Run  V 

TABLE  E™2 


k 

3 

1 

35875 


.2  100.0 


AdjectiviBs 


Affix  Fratfionoy  Percent 


^  Verbs 

Affdoc 

Freq\icncy 

Percent 

Cumulative 

Percentage 

Affix 

Frequency 

Percent 

Cumulative 

Percentage 

ex 

1|3^3 

19.^ 

19.^ 

He 

51i 

.2 

97.8 

Tb 

3008 

ly.^ 

33.0 

ax 

Ii8 

.2 

98.0 

H 

2799 

12.6 

1*^.6 

auH 

ii7 

.2 

98.2 

OT 

130^ 

^.9 

51.^ 

K) 

li5 

.2 

98.li 

K 

1269 

17 

^7.2 

BIX 

ill 

.2 

98.6 

eu 

U70 

^.3 

62. ^ 

BDi 

36 

MT 

1068 

li.8 

67.3 

b 

36 

KU 

1018 

it.6 

71.9 

ax 

29 

a 

9hh 

li.2 

76.1 

eB 

28 

e 

8?1 

3.8 

79.9 

Bie 

2ii 

.7 

99.3 

# 

76h 

3.k 

83.3 

oe 

21 

0 

^eh 

2.6 

8^.9 

Bsm 

21 

OM 

h93 

2.2 

88.1 

aii 

21 

yx 

lilit 

1.9 

90.0 

tdi 

20 

aa 

286 

1.3 

91.3 

auH 

20 

.ii 

99.7 

u  ■ 

27l^ 

1.2 

92.5 

eK  - 

17 . 

— 

— 

ax 

.  192 

.9 

‘  93.il 

oro 

13 

OB 

191 

.9 

9li.3 

yx) 

11 

7 

1^1 

.7 

95.0 

m 

10 

oii 

129 

.6 

95.6 

Bm 

8 

) 

\ 

aa 

119 

96,1 

6 

ii 

lOh 

96.6 

sxe 

3 

ee 

89 

97.0 

loat 

1 

ax 

77 

.3 

97.3 

Mxe 

1 

B 

^8 

.3 

97.6 

MX 

1 

.3 

100.0 

22265 

Frequency  of  Reference  to  Verbal  Dictionaiy  Entries  by  Iffisc'. 
(Both  Compatible  and  Incompatible)  in  Frequency  Run  V 

TABLE  E-k 


Appendix  F 

THE  SUBROUTINES  IN  THE  EXPERIMENTAL  PREDICTIVE 
SYNTACTIC  ANALYSIS  PROGRAM 


F-1 


The  explicit  instructions  for  the  operation  of  the  experdjnental 
predictive  syntactic  analysis  technique  are  presented  in  this  appendix. 

Also  included  is  a  list  of  the  function  type  and  essence  subroutines,  as 
well  as  the  PSI  associated  i-dth  each  of  the  latter  (Table  F-1).  The 
description  of  the  different  PSI  is  given  in  Table  F-2,  The  abbreviations 
that  are  used  in  the  main  tables  are  listed  in  Table  F-3,  and  in  Table  F-l^, 
an  outline  of  the  format  of  the  main  tables  is  given. 

The  detailed  operation  of  each  subroutine  is  presented  in  Table  F-^. 
An  illustration  of  the  use  of  this  table  will  help  familiarize  the  reader 
VTith  the  process.  Consider  the  process  when  a  subject  prediction  is  being 
tested  against  the  alternative  argument,  /noun,  nominative,  singular, 
masculine/  of  a  noun  such  as  CT5fl3HT,  the  first  word  in 'a  hypothetical 
sentence.  The  first  entry  under  'L^ub ject-E"  in  the  ta.ble  of  essences 
signifies  that  the  prediction  was  made  either  by  initial,  that  is,  at  the 
beginning  of  a  sentence,  or  by  comma.  If  made  by  comma,  the  prediction  is 
inactive  initially,  that  is,  its  PSI  =  $1.  It  should  also  be  noted  that 
the  subject  prediction  can  be  modified  by  a  verb  predicate  head  or  an 
adjective  predicate  head  if  either  of  them  precedes  the  subject. 

The  subject  prediction  can  be  fulfilled,  on  the  one  hand,  by  a 
noun,  adjective,  pronoun,  or  numeral  alternative  argument  that  is  either 

I 

nominative  singular  or  nominative  plural  and,  on  the  other  hand,,  by  an 
infinitive  verb.  Information  in  a  reserved  register  indicates  whether  or- 
not  the  predicate  head  has  already  been  identified.  If  so,  a  word  tha.t  can 


F-2 


be  a  subject  is  tested  for  agreement  with  it  in  person,  number,  and  gender. 

In  the  example  chosen,  cTyflenT  can  be  a  nominative  singular  noun.  Since  it 
is  the  first  vrard  in  the  sentence,  there  is  no  information  on  person,  number, 
or  gender  stored  in  the  prediction  pool,  and  cTyflenr  fulfills  the  prediction. 

If  the  preferred  argument  of  the  noun  is  subject,  it  is  necessary 
to  refer  to  the  function:  type  subroutines  to  determine  what  new  predictions 
must  be  entered  in  the  pool  and  what  predictions  already  in  the  pool  are 
to  be  modified  (or  wiped) 

The  noun  function  type  subroutine  first  indicates  the  formal  prop¬ 
erties  that  identify  a  noun  in  the  experimental  program.  The  twelve  essences 
that  can  be  fulfilled  by  a  noun  are  listed  next,  and,  indeed,  the  5ubject-E 
essence  is  among  them.  This  function  type  subroutine  is  also  called  in  by 
the  pronoun  and  numeral  function  types. 

The  first  prediction  made  by  this  subroutine  for  every  noun  alterna¬ 
tive  argument  is  for  a  noun  complement.  Since  cTyflenr  was  selected  as 
the  subject,  control  is  then  transfered  to  the  adjectiyetiiioijn-^b ject 
function  type  subroutine. 

The  adjective -noun  subject  subroutine  is  accepted  by  nothing,  that 
is,  control  is  transfered  to  this  subroutine  only  from  another  subroutine, 
in  this  case,  the  noun  subroutine.  The  adjective-noun  subject  subroutine 
modifies  the  predicate  head  prediction  if  the  latter  ha,s  not  been  fulfilled 
previously.  Next,  three  predictions  are  put  into  the  prediction  pool: 
compound  subject.  Infinity,  and  end  wipe.  Since  no  other  conditions  are 
listed,  control  is  transfered  back  to  the  skeleton  which  initializes  the 
analysis  of  the  next  word. 


For  the  reader  who  wishes  to  try  the  technique  of  predictive  analysis, 
several  Russian  sentences  analyzed  on  a  word-by-word  basis  have  been 
provided  (Figs.  F-1  to  F-6).  The  grammar  codes  necessary  to  cairy  out 
the  syntactic  analysis  are  listed  in  Tables  3-5^  3-7,  3-8,  3-9,  and 


F-ii 


Essences  (Tester  RoutiJies) 

Function  Types 
(Predictor  Routines) 

P^ 

1.  Subject  -  E 

01 

1.  Initial 

2.  Compound  Subject  -  E 

99 

2.  Noun 

3.  Predicate  Head 

01 

3.  Pronoun 

ll.  Compound  Predicate  Head 

99 

li.  Adjective 

5.  Left  Object  -  E 

03 

Numeral 

6.  Compound  Left  Object  -  E 

99 

6 .  Verb  ! 

7.  Object  -  E 

01 

7.  Advsrb 

8.  Compound  Object  -  E 

99 

8.  Preposition 

9.  Ifester/Cessence) 

01 

9.  Participle 

10.  Verb  Master  -  E 

00 

10.  Gerund 

11.  Compound  Verb  Master  -  E 

99 

11.  Infinite  Conjunction 

12.  Noun  Complement  -  E 

00 

12.  Relative  Conjunction  -  T 

13.  Conipound  Noun  Complement  -  E 

99 

13.  Comma 

lli.  Preposition  Complement  -  E  — 

00 

lii.  Adjective -Noun  Subject 

l5.  Compound  Preposition 

1^.  Pronoun  Subject 

Complement  -  E  I 

99 

16.  Verb  Subject 

.  1 
16 .  Phraser 

•■03 

17.  Verb  Predicate  Head 

17.  Relative  Conjunction  -  E 

03 

18.  Adjective  Ffedlcate  Head 

18.  Relative  Pronoun  -  E  _ _ 

03 

19.  Left  Object  -  T 

19.  Infinity 

02 

^01. 

20.  Object  -  T 

20.  End  Wipe 

21.  Noun  Complement  -  T 

21.  End  of  Sentence  -  E 

01 

22.  Preposition  Complement  -  T 

22.  Arbitraiy  Choice 

02 

23.  Verb  Master  -  T 

2h.  —  $ 

2^„  End  of  Sentence  -  T 

Index  of  Subroutines  in  the  Experimental  Predictive 
Sijmtactic  Analysis  Program. 


TABLE  F"1 


PSI 

Value 

Description 

00 

-  Wiped  by  end  wipe  if  not  fulfilled  by  next  word. 

01 

-  Wiped  by  end  wipe.  Must  bo  fulfilled  for  analysis  to 

be  accepted,  therefore,  wite  on  hindsight  when  wiped. 

02 

Wiped  only  by  end  wipe  but  not  when  fulfilled. 

03 

Wiped  by  end  wipe. 

1|9 

Changed  to  99  by  prediction  pool  updating  process;  wiped 

by  end  wipe  and  end  of  sentence. 

99 

Inactive.  Activated  by  infinite  conjunction;  wiped  by 

0 

end  wipe  and  end  of  sentence. 

- - — - 

f 

PSI  +  50  -  Inactivated  (PSI  =  99  is  special  case.) 

Prediction  Span  Indicators  (PSI) 
in  Experimental  Predictive 
S5Titactic  Analysis  Program 


TABLE  F-2 


Coinp'd 

Compl. 

Subj. 

Obj. 

Pred. 

Prep'n 

ConJ. 

Adj. 

PSI 

OW 

CPx 


Canpound 

^  i 

Complement 

Subject 

Object 

Predicate 

Preposition 

Conjunction 

Adjective 

Prediction  Span  Indicator 
Organized  Word 

Character  Position  (1  <  x  <  12) 


1 2 : 

3h. 

5 

6 

7 

8  ! 

?  10 

11 

12 

AWx 

X 


X 

TWjc 


-  Analyzed  Word 

t»  1  :  word  2h  of  30  word  item, 

or  word  06  of  texthadic  item. 

=  2  :  word  27  of  30  word  item, 

or  word  07  of  texthadic  item. 

"  Texthadic  Word  (0  <  x  <  9) 


GWx 


-  Grammatical  Word  (as  kept  in  e3q)erl- 
mental  program) .  (1  <  x  <  5) 


List  of  Abbreviations 
TABLE  F»3 


4 

\ 


Efisence 


R.  Predicted  by: 

i. 

List  of  pll  function  types  that 

2. 

• 

« 

■ 

predict  the  essence. 

C.  Modified  by: 

1. 

List  of  al]  function  tjqoes  that 

2. 

• 

• 

• 

modify  the  essence. 

D.  Grammatical  Information  required: 

1. 

Description  of  each  word  of  grammatical 

2. 

« 

information  required  by  the  tester  routine 

« 

« 

in  the  order  in  which  it  must  appear. 

.  E."  Fulfilled  by: 

1. 

List  of  function  types  recognized  and 

2. 

; 

the  intersection  test  made  with  each. 

F .  Marks : 

XXXXXX 

Characteristic  marking  in  IW. 

Format  of  Essence  Table 
TABLE  F-i;(a) 


F-8 


A.  Function  Type 


B.  Charaotorized  by; 


1. 


2. 


List  of  identifying  symbols, 


C.  Accepted  by; 

1. 

List  of  names  of  various  essences  which  accept 

2. 

this  function  type. 


D.  Called  in  by; 

1. 

List  of  names  of  various  function  types 

2. 

which  call  in  this  function  type. 


j _ Predicts: 


2. 


List  of  all  essences  predicted  by  the  routine  in 

the  order  of  their  consideration,  and/or  list  of 
all  essences  whose  predictions  are  to  be  modi¬ 
fied,  with  complete  instructions  for  each  modi¬ 
fication. 


.  Other  conditions: 


Format  of  Function  Type  Table 
TABLE  F-J|(b) 


F-10 


V 


ESSENCES 

PREDICTED 

BY 

MODIFIED 

BY 

GRAMMATICAL  IfffOFHATION 

REQUIRED 

Subjeot-E 

1,  Initial  (both  active 
and  inactive) 

2.  Comma  (inactive) 

1.  Verb  Prod,  Hoad 

2.  Adj.  Pred.  Head 

1.  Person  in  CPI  of  QWl  (V,Z,T.A) 

2.  Number  in  CP2  of  OV/1  (S,P,a) 

3.  Gender  In  CEL  of  GW2  (M,F,N,A) 
h.  If  CP3  of  GW2  >  0,  then  Pred. 

Head  has  been  found 

Conipound 

Subjeot-E 

1.  Ad j. -Noun  SubJ, 

2.  Pronoun  SubJ. 

3.  Verb  SubJ. 

I.  Activated  by  In¬ 
finite  ConJ. 

1.  Person  in  CPI  of  GVa  (V,Z,T,A) 

2.  .‘umber  in  CP2  of  GWl  (S,P,A) 

3.  Gender  in  CEL  of  0W2  (M,F,N,A) 
ii.  If  CP2  of  GW2  >  0,  must  be 

verb  with  F  in  CF9  of  AWl 

5.  If  CP3  of  GW2  >  0,  then  Pred. 
Hoad  has  been  found 

Predicate 

Head 

1.  Initial  (both  active 
and  inactive) 

2.  Comma  (inactive) 

1.  Adj. -Noun  SubJ, 

2.  Pronoun  SubJ. 

3.  Verb  SubJ. 
ii.  Left  ObJ.-T 

1.  Person  in  CEL  of  GWl  (V,Z,T.A) 

2.  Number  in  CP2  of  OWl  (S,P,A) 

3.  Gender  in  CEL  of  GW2  (M,N,F, 

h,b,a) 

Ij.  If  CP2  of  GW2  >  0,  an  obj. 
has  been  found 

5.  If  CP3  of  GW2  >  0,  a  subJ. 
has  been  found 

Compound 
Pred.  Heac 

1,  Verb  Pred.  Head 

2,  AdJ,  Pred.  Head 

1,  Activated  by 
Infinite  ConJ. 

1.  Person  in  CPL  of  OWl  (V,ZjT,A) 

2.  Number  in  CP2  of  OWL  (S,P,A) 

3.  Gender  in  CEL  of  GW2  (H,N,F, 

h,b,a) 

Left 

Object-E 

1,  Initial  (active  and 
inactive) 

2.  Comma  (inactive) 

1,  Wiped  by  Verb 
Pred.  Head 

2.  Wiped  by  Adj. 
Pred.  Head 

none 

Compound 
Left  Ob- 
ject-E 

1.  Left  ObJ.-T 

1.  Activated  by 
Infinite  ConJ. 

1,  Case  word  in  appropriate 
position  with  zero  fill 

Object-E 

1,  Verb 

2.  Participle 

nothing 

1.  Case  and  number  word  in 
appropriate  positions  with 
zero  fill,  see  NAVI  notation. 

Compound 

Object-E 

1.  ObJ.-T 

1.  Activated  by 
Infinite  ConJ. 

1.  Case  word  in  appropriate 
position  with  zero  fill 

Essence  Subroutines  Used  in  the  Experimental  PredictiTre 
Syntactic  Analysis  Program 

TABLE  F-5(a) 


FULFILLED  BY 


iiAM(S 


1.  Noun 

2.  Adj,  Satiafying  graimatical  infonnation 

3.  Pronoun  and  N  in  CH  or  CFI  of  AVfl 

it.  Humoral 

5.  Vorb  »d.th  F  in  CP9  of  AWl  satlsfylns 
grammatical  Information 

1,  Noun 

2.  Adj,  Satisfying  grammatical  information 

3  •  Pronoun  and  H  in  CEL  or  CP?  of  AVO. 

It.  Humoral 

S,  Verb  with  F  in  CP9  of  AWl  vith  CP2  of  GW2  >  0 


1.  Verb  fulfilling  grammatical  information,  with 
D  in  CP9  of  AWl 

2.  Adj,  Pred.  Head,  fulfilling  grammatical 
information 


1.  Verb  fulfilling  grammatical  information  with 
D  in  CI9  of  AWL 

2,  Adj,  Pred.  Head,  fulfilling  grammatical 
information 

1.  Noun 

2.  Pronoun  With  I  in  CP?, or  CPLl  of  AWl  or 

3.  Adj,  A  in  CP3  or  CP?  of  AV/l,  in  that  order 

It,  Numeral 

1,  Noun 

2,  Pronoun  with  appropriate  case  in  AWl 

3,  Adj. 

it.  Numeral 


1,  Noun 

2,  Adj,  with  appropriate  case  in  AWL 

3,  Pronoun  using  NAVI  notation 
It.  Numeral 

1.  Noun 

2 .  Adj ,  with  c.ppropriate  case  in  AWl 

3.  Pronoiui 
It.  Numeral 


1.  AASUBJCTA 


1.  ACSUBJCTA 


1.  AAVAPREDA 

2.  AAAAPREDA 


1,  ACVAPREDA 

2.  ACAAPREDA 


1.  AALAOBJAA 


1,  ACLAOBJAA- 


1.  AAOBJECTA 


1,  ACOBJECTA 


Table  F~5(a)  (continuedj 


F-12 


1 


ESSENCES 

PREDICTED 

DY 

MODIFIED 

BY 

OiWiMATJCAL  INFOmATION 

REQUIRED 

Master/ 

(essence) 

T.  AdJ. 

nothing 

1.  Unambiguous  case  and  number 
with  zero  fill,  using  NAVI 
notation 

2.  Unambiguous  gender  with  aero 
fill,  using  NAVI  notation 
(M,F,N,A,B,U,H) 

1  Mark  of  Essence  which  pre¬ 
dicted  the  Master 

Verb 

Haster-E 

1.  Verb 

2.  Participle 

nothing 

none 

Compcund 

Verb 

Master-E 

1.  Verb  Master-T 

1,  Activated  by 
Infinite  Conj. 

none 

Noun  Comp- 
lement-E 

1.  Noun 

nothing 

none 

Compound 
Noun  Comp- 
lement-E 

1 

1,  Noiui  Compl,-T 

1,  Activated  by 
Infinite  Conj. 

none 

Preposition 

Complo- 

ment-E 

1,  Prep'n 

nothing 

1.  Unambiguous  case  and  number 
with  zero  fill  in  NAVI 
notation,  i.e.,  if  there  is 
more  than  one  case  and  number 
possibility,  each  one  is  con¬ 
sidered  as  a  separate  pre¬ 
diction.  Order  of  predictions: 
same  as  order  of  listing  in  S0'.6o 

Compound 

Preposition 

Comple- 

Eent~E 

1,  Prep'n  Corapl.-T 

1.  Activated  by 
Infinite  Conjc 

1.  Case  of  p“op'n  corapl,  in  both 
singular  ..nd  plural  positions 

Phraser 

1.  Comma 

2c  Initial 

nothing 

none 

Table  F”^(a)  (continued) 


F-13 


FULFILLED  BT 

HANKS 

1.  Adj. 

2*  Noun  W.th  intersection  in  case,  number,  and 

3*  Pronoim  gender  in  AWL  and  AVC  using  NAVI  notation 

1(.  Numeral 

1.  mmiXM,  where  x-x  is  the 
maricing  of  the  essence  of  the 
word  predicting  the  Master 

1.  Verb  vith  P  in  CP?  of  AWl 

1.  AAVAHASTA 

1.  Vera  with  ?  in  CP?  of  AMI 

1.  ACVAHA8TA 

1.  Adj. 

lit  Numeral 

1.  AANACOTPS 

1.  Add. 

^pSJom,  Wtx  a  in  CP2  or  OPe  Am  . 
ii.  Numeral 

!•  ACNAOTPA 

.. 

1.  Add. 

2.  Noun  W.th  Intersection  in  case  in  AW. 

3.  Pronoun  using  NAVI  notation 

Ij,  Numeral 

1.  AARACOMPA 

1.  Noun 

2.  Add.  Vith  intersection  in  case  in  AVtt 

3.  Pronoun  ui^ng  NAVI  notation 

ll.  Numeral 

1.  ACRACOTPA 

1.  Participle 

2.  ^erb  with  Q  in  OP?  of  AW 

1.  AAFRASENA 

Table  F-5(a)  (cont3.med) 


F-]li 


1SS3ENCS3 

TOaiaiED 

31 

MODIFIED 

BY 

GR/UMATICAL  INFOEMATI® 

RF.QtJIRED 

Rolfttivo 

Conjuno- 

iiotwE 

1,  Comma  ••  ‘ 

2.  Inj.tlal 

nothing 

none 

Rfilatir« 

Pronoun-E 

!•  Comma 

2.  Initial 

nothing 

none 

Infinity 

1«  Initial  (inactive) 

2.  Conma  (inactive) 

3.  Participle 

Gerund 

f)4  Pronoun  Subj, 

6,  AdJ.-Noun  Subj, 

7.  Verb  Subj. 

8.  Obj.-T 

9,  Loft  Obj.-T 

10,  Noun  Gompl. 

11,  Prep'n  Compl.-T 

12,  Comma  (ttrlce) 

13,  Initial  (fcmr  times; 

nothing 

none 

End  Wlpo 

1,  Gerund 

2,  Participle 

3,  Comma  (inactive) 

It,  Initial 

5.  Pronoun  Subj, 

6:  Adj.-Noun  Subj. 

7.  Verb  Subj. 

8.  Obj.-T 

9.  Left  Obj.-T 

10.  Noun  Corapl.-T 

11.  Prep'n  Compl.-T 

12.  Comma  (twice) 

13.  Initial  (PSI=03) 
(three  times) 

1,  Activated  by 
Relative  Pro- 
noun-E 

2«  Activated  by 
Relative 

ConJ.-T 

none 

End  of 
SBntencO"E 

1.  Initial 

nothing 

none 

Arbitrary 

Choico 

1.  Initial 

lothing 

none 

Table  F“^(a)  (continued) 


FULFILLED  BT 

MARKS 

1.  Relative  Conjt-T 

1,  AARACONJA 

1.  Bv0r7  Relative  Pronoun  fulfills  this  eeaenoe,  whether 

1.  Does  not  marie 

or  not  there  has  been  previous  succese.  Upon  fulfill- 

ment,  the  routine  activates  all  predictions  in  the 

prediction  pool  with  50  <  PSI  <98,  It  does  not  call 

to  the  success  control  :^tine7  and  continues  as  if 

there  had  been  no  success 

1,  Pi-ep'n 

1.  INFAPREPAAAA 

2.  Adverb 

2.  HffAADVBAAAA 

3.  Dollar  Sign 

3.  IliFAll^gAAAA 

Connna 

Ij.  INFACOMMAAAA 

5.  Infinite  Conj. 

5.  INFAC^NJNCTA 

1.  "No  success" 

Does  not  mark 

Iflpes  everything  preceding  in  prediction  pool  in- 

eluding  itself  and  continues  down  prediction  pool. 

■ 

Iftltes  all  wiped  PSI  ■  01  predictions  on  Hindsight 

tape 

1.  End  of  Sentence-T 

1,  ENDAOFASENT.  if  .  in  Ca  of  OW 

2.  SEMICOLONAA  if  j  in  CRL  of  OW 

1.  Adjo 

L,  AAAPJTRAA 

2,  Noun 

3.  Pronoun 

it.  Verb 

5,  Numeral 

6.  Others  not  accepted  by  Infinity  or  Regular 

Note:  Increase  CHAH 

Essences,  which  do  not  make  predictions 

try  oaa. 

Table  F“?(a)  (continued) 


FUNCTION 

TYPE 

CHARACTERIZED 

BY 

ACCEPTED 

ni 

1 

CALLED 

IN  BY  ^ 

Initial 

nothing 

nothia? 

1,  Program  Initializer 

2,  End  of  Sentence-T 

Noun 

1.  N  in  CEL  of  OVJ 

2.  N  in  CP2  of  OV/,  if 

P  in  GPl  of  OW 

1,  Haster/( essence) 

2,  Prep'n  Compl, 

3,  Noun  Compl, 
li,  Subj,-E  ^ 

Left  Obj.-F, 

6.  Obj.-E 

7.  Comp'd  SubJ.-E 

8.  Comp'd  Obj.-E 

9.  Comp'd  Noun  Compl, -E 

10.  Comp'd  Left  Obj.-E 

11.  Comp'd  Prep'n  Compl, -E 

12.  Arbitrary  Choice 

1,  Pronoun 

2.  Numeral 

Pronoun 

1.  P  in  CEL  of  OW 

1.  Prep'n  Compl, -E 

2.  Noun  Compl, -E 

3.  Subj.-E 
h.  Obj.-E 

5.  Left  Obj.-E 

6.  Comp'd  Subj.-E 

7.  Comp'd  Obj.-E 

8.  Comp'd  Noun  Compl. -E 

9.  Comp'd  Left  Obj.-E 

10.  Comp'd  Prep'n  Compl. -E 

11,  Arbitrary  Choice 

nothing 

Fii,ncticai  Type  Subroutines  Used  In  the  Experimental  Prsdictivs 
Syntactic  Analysis  Program 

TABLE  F-?(b) 


PHEDICrs 

OTHER  CONDITIONS 

Ij  Rimoer 
i..  Infinity 

3.  End  WLpo  PSI  -  03 

Relative  Con J  , 

5.  Infinity 

6.  End  Wipe  PSI  -  03 

7 •  Relative  Pronoun-E 

8,  Subj.-E  (inactive) 

9.  left  Obj,-E  (inactive) 

10,  Pred,  Head  (inactiv^) 

11,  Infinity  (inactive) 

12.  End  Wipe  P3I  “  03 

13.  8ubj.-E 
lit.  Loft  Obj.-E 

15,  Prado  Head 

16,  Infinity 

17,  End  Wipe 

IB.  End  of  Sentence-E 

19.  Infinity 

20.  Arbitrary  Choice 

1,  S.jtn  chain  nujibor  to  00 

1 

1,  Noun  Compl. 

If  not  master,  and: 

1.  if  subj,  or  comp'd  subj.,  go  to  (a)  Adj.- 
Noun  Subj.  or  (b)  Pronoun  Subj. 

2.  if  obj,  or  comp'd  obj.,  go  to  Obj.-T 

3.  if  left  obj,  or  comp'd  left  obj,,  go  to 

Left  Obj.-T 

1:.  if  noun  compl.  or  comp'd  noun  compl,,  go 
to  Noun  Compl. -T 

3.  if  prop'n  compl,  or  corap'd  prep'n  compl,, 
go  to  Prep'n  Compl, '-T 

nothing 

> 

1.  If  N  in  CP2  of  OW,  go  to  Noun 

2.  If  A  in  CP2  of  OW,  go  to  Adj. 

TABLE  F«.5(b)  (continued) 


FUNCTION 

Tira 

CFiARACTEEtZED 

BY 

ACCE'^fED 

Bf 

CALLED 

IN  BY 

Adjective 

1.  A  in  era.  of  OK,  also 
CF9  of  OK  <  1 

2.  A  in  CP2  of  OK,  if  P 
in  era  of  OW 

1.  Mastar/Cossence) 

2.  Prop'n  Compl, 

3.  Noun  Compl, 
h,  SubJ.-E 

Obj.-E 

6.  Left  Obj.-E 

7.  Comp'd  Subj.-E 

8.  Comp'd  Obj.-E 

9.  Comp'd  Noun  Compl. -E 

10.  Comp'd  Left  Obj.-E 

U.  Comp'd  Prep'n  Compl. -E 
12.  Arbitrary  Choice 

1.  Pronoun 

Numeral 

In  D  in  era  of  OH 

1.  Maator/( essence) 

2.  Prep'n  Compl. -E 

3.  Noun  Compl. -E 
t.  Subj.-E 

5.  Obj.-E 

6.  Left  Obj.-E 

7.  Comp'd  Subj.-E 

8.  Comp'd  Obj.-E 

9.  Comp'd  Noun  Compl. -E 

10.  Comp'd  Left  Obj.-E 

11.  Comp'd  Prep'n  Compl. -E 

12.  Arbitrary  Choice 

nothing  '  ” 

Verb 

1.  V  in  era  of  OW 

1.  Fred.  Head 

2.  Subj,-£ 

3 .  Verb  MasteiSE 

Hiraser 

5.  Corap'd  Subj.-E 
•  6.  Comp'd  Pred.  Head 

7.  Comp'd  Verb  Master-E 

8.  Arbitrary  Choice 

nothing 

Adverb 

1.  H  in  era  of  OW 

2.  A  in  era  of  OW  and 

2  or  3  in  CF9  of  OK 

3.  A  in  era  of  OH  and 

1  in  CF6  of  OW 

1.  Infinity 

nothing 

Preposi¬ 

tion 

1.  R  in  era  of  OW 

J _ L 

1,  Infinity 

TABLE  F“5(b)  (continued) 


PIEDIGTS 

OTHER  CONDITIONS 

1.  Ma8ter/(ossQnoe) 

If  not  master,  and: 

1.  if  subj,  or  comp'd  aubj,,  go  to  AdJ.- 
Hottn  Subj. 

2.  if  obj,  or  comp'd  obj,,  go  to  Obj,-T 

3.  if  left  obj,  or  comp'd  left  obj,,  go 
to  left  Obj,-T 

if  noun  oompL,  or  comp'd  noun  oompl,, 
go  to  Noun  Conpl.-T 

5.  if  prep'n  oomji,  or  ccnp'd  prap'n 
oompl,,  go  to  Prep'n  CooipL.-T 

1.  If  A  in  CP2  of  OWi 

(a)  Haotor/Cesaenco)  with  PSI»00  with 
caae  and  nojnber  detexYiiined  by 
poaition  (ualng  MAVI  Notation) 

In  TW8,  of  any  gender 

2.  If  N  in  CP2  of  OWt 
(a)  nothing 

< 

1,  If  A  in  CP2  of  OW,  and  not  maaters 

(a)  If  subj,  or  oomp'd  subj.,  go  to 

Ad j, -Noun  Subj, 

(b)  if  obj.  or  comp'd  obj,  go  to 

Obj.-T 

(c)  if  left  obj.  or  comp'd  left  obj., 
go  to  Left  Obj,-T 

(d)  if  noun  compl.  or  comp'd  noun  compl,, 
go  to  .Noun  Compl. ~T 

(e)  if  prep'n  compl,  or  comp'd  prep'n  • 
compl.,  go  to  Prep'n  Compl. -T 

2,  If  N  in  CP2  of  OWt 

(a)  go  to  Noun 

1 .  Verb  Maater-E 

2,  Obj.-E 

(a)  if  R  in  OHIO  of  AWL,  then  only 
Inatrumantal 

(b)  if  no  government  prediction  in 

OW  then  only  Accuaativa 

(c)  if  Prod.  Head,  and  left  obj.  has 
been  found,  do  not  predict  Obj.-E 

1,  If  aubj,  or  corap'd  subj.,  go  to  Veib 

Subj, 

2,  If  predo  hoed  or  comp'd  pred.  head,  go 
to  Verb  Pred.  Head 

3,  If  G  in  CP?  of  AWl,  go  to  Gerund 

b.  If  veib  master  or  comp'd  verb  master, 
go  to  Verb  Maater-T 

nothing 

1,  Inhibits  wiping  of  Prediction  Pool 

1.  Prep'n  Compl. 

1,  Calls  back  to  Itself  fer ‘making- more 
than  one  unique  prediction  of  prep'n 
compl. 

TABLE  F--^(b)  (continued) 


{■•UNOTION 

TIP2 


CHARACTERIZED 

BT 


ACCEPTED 

BT 


CAU-ED 
IN  BT 


Participle 

1.  A  in  CPI  of  « 
and  >  0  In  CPIO 
of  CW  and  not 
>  0  in  CP6  and 

CPi?  of  OW 

1.  Phrseer 

nothing 

Gerund 

1,  Q  in  CP?  of  AWL  and 

V  in  CPI  of  OW 

nothing 

1,  Verb. 

Infinite 

Conjenotlon 

1-.  and  •Vijot" 

1,  Infinity 

nothing 

Relatlre 

Con- 

junction-T 

1.  C  in  CEL  of  OW}  if 
tti'i  or  chock 

prediction  pool  for 
unfulfilled  Subj., 
Pred,  Head,  or  Obj. 

If  none,  accept,other- 
wise  reject. 

1.  Relative  ConJ.-B 

nothing 

Comma 

1.  ,  in  era  of  OW 

I4  Infinity 

nothing 

Adjective- 
Noun  Sub¬ 
ject 

1.  i-iSBBJCTA  in  TW? 
and  neither  V  in 
era  of  OW  nor  PN 
in  CPI  and  2  of 

OW 

nothing 

1.  Noun 

2.  Adj. 

TABIE  F-^(b)  (continued) 


,  PEEDIOTS 


OTHER  CONDITIONS 


1.  Verb  Maetor  none 

2.  ObJ.**B  (government  in  CFl  and  2  of  SOW3j 

if  no  P-oode,  ncoueatlve  Obj,  predloted) 

3.  Infinity 
!)•  End  Wipe 


1<  Infinity  none 

2,  End  Wipe 


nothing 


1,  Activatea  all  inactive  predictions 
(50  <  (PSI)  <  98) 


1.  After  nomal  id  ping  of  pr<Kllotion 
pool,  activate  all  pradiotionn  for 
which  PSI  -  99. 


none 


1.  Phraser 

2.  Infinity 

3.  End  Wipe  PSI  -  03 
It.  Relative  Conj. 

5.  Infinity 

6.  End  Wipe  PSI  -  03 

7.  Relative  Pronoun-E 
0,  SubJ.-E  (Inactive) 

9.  Left  Obj, -E  (inactive) 

10,  Pred,  Head  (inactive) 

11,  Infinity  active  (03) 

12,  End  Wipe  active  (03) 


1,  Modifies  Pred.  Head  (if  it  has  not 
been  fulfilled)  to  3rd  person,  and 
to  number  and  gender  of  selected 
function  and  puts  >  0  in  CP3  of  QW2 
If  comp'd  subj,,  modify  to  3rd  per¬ 
son  plural  any  gender.  See  "Pred, 
Head"  for  format  information. 


1.  Before  making  pjwdictions,  wipe  all 
predictions  in  pool  with  50  <  PSI  < 
98. 


none 


0 


2.  Comp'd  Subj.-E  with  any  person, 
number,  and  gender. 

3.  Infinity 
It.  End  Wipe 


TABIE  P«5(b)  (continued) 


F-22 


FUNCTION 

TTFE 

CHAHACTERIZED 

. -  BT - - 

ACCEPTED 

DY  - - - 

CALLED 
-  IN-BY  - 

Pronoun 

Subject 

1.  i-iSUBJCTA' in  TW?  and  FN, 
in  CRL  and  2  of  OW 

\ 

nothing 

1,  Noun 

Vert) 

Subject 

1.  i-lSUDJCTA  in  TVS'  and  V 
in  CPI  of  OW 

1 

nothing 

1.  Verb 

Ve'rb 

Predicate 

Head 

1.  i-iVAPREDA  in  TV6> 

nothing 

1.  Verb 

Adjective 

Predicate 

Head 

I4  A  in  Cfl  of  OW  and  1  or 

2  in  CF9  of  OW 

( 

1.  Pi'Rd,  Head 

nothing 

Left 

Objsct-T 

1.  i-iLAOBJAA  in  TW? 

nothing 

1.  Noun 

2.  Adj. 

Object-T 

1.  i-iOBJECTA  in  TW9 

nothing 

1.  Noun 

2,  .Adj, 

T^BLE  P-5{b)  (continued) 


A 


1.  Modifies  Pred.  Head  (if  it  has  not 
been  fulfilled)  as  to  parson,  number, 
and  gender  of  pronoun  and  puts  >  0 
in  CP3  of  QW2,  If  comp'd  subj., 
modify  to  3rd  person  plural.  See 
"Pred.  Head"  for  format  infonnation. 

2.  Comp'd  Subj,-E  with  any  person, 
number,  and  gender. 

3.  Infinity 
End  Wipe 


1.  Modifies  Pred.  Head  (if  it  has  not 
been  fulfilled)  to  3rd  person, 
neuter,  singular,  and  puts  >  0  in 
CP3  of  GW2.  See  "Pred.  Head"  for 
format  infonnation 

2.  Corap'd  SubJ.-E  vith  infinitive 

3.  Infinity 
i(.  End  Wipe 


1.  Modifies  Subj.-E  (if  it  has  not 
been  fulfilled)  as  to  person, 
number,  gender  and  puts  >  0  in 
CP3  of  GW2.  See  "Subj.-E"  for 
format  Inforaation 

2.  Erases  Left  Obj.-E,  if  it  has 
not  been  fulfilled 

3.  Comp'd  Pred.  Head  with  person, 
number,  and  gender  same  as  Verb 
Pred.  Head 


1.  Comp'd  Pred.  Head  -  with  oerson, 
number,  and  gender  same  as  AdJ. 
Pred,  Head 

2.  Erases  Left  Obj.  if  it  has  not 
been  fulfilled 

3.  Put  >  0  in  CP3  of  GW2  of  Subj.-E 
if  not  fulfilled 


1,  Puts  >  0  in  CP2  of  GWP  of 
Pred.  Head 

2,  Comp'd  Left  Obj.-E  with  same 
case  as  Left  Obj,-T 

3,  Infinity 
Ij.  End  Wipe 


1.  Comp'd  Obj.-E  with" same  casoja;?.  s,-.  ,  none 

Obj.-T  ■ 

2.  Infinity 

3.  End  Wipe 


TABLE  F“^(b)  (continued) 


FUNCTION 

TYPE 

CHARACTERIZED 

BY 

ACCEPTED 

BY 

CALLED 

IN  BY 

Noun  Comple- 

1.  i-lNACWA  in  TH? 

nothing 

1.  AdJ. 

ment-T 

- 

2«  Noun 

PrepoBition 

Conplement-T 

1.  i-iRACOMPA  in  TK? 

nothing 

1.  AdJ. 

2,  Noim 

Vert)  Mastor-T 

1.  i-i^STA  in  TV© 

;;np,tMng 

1.  Verb 

1  .....  t 

1.  1  in  CFL  of  Russian 
word 

1,  Infinity 

nothing 

End  of 

1.  .  in  OHl,  CEL 

1.  End  of 

nothing 

Sentence-T 

2.  j  in  CMl,  CRL 

Sentenee-E 

TABLE  F-^(b)  (continued) 


PREDICTS 


OTHER  CONDITIONS 


.  1»  Comp'd  Noun  Compl.-E 

2*  Infinity 

3*  End  Wipe 

none 

It  Comp'd  Prapt  Complt-E  vith  sane  case 

2t  Infinity 

3.  End  Wipe 

none 

1*  Comp'd  Vo  A  Haaten-E 

none 

nothing 

It  Inhibits  wiping  of  prediction  pool . 

nothing 

It  W-pes  prediction  pool  complotelyt 

Put  space  blockette  on  Output 

2.  Goes  to  Initial 

TABU!  F-^(b)  (oontiraied) 
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